margo / specification

Margo Specification
https://specification.margo.org/
Other
23 stars 6 forks source link

Decision Tracker - Describe the Workload Deployment Status mechanism #10

Open ajcraig opened 5 months ago

ajcraig commented 5 months ago

Below, I have outlined a proposal for the Deployment Status Update File, that is utilized by the Workload Orchestration Agent to inform the WOS of the status per deployment.

Note: This proposal is one option to inform the WOS the status of the deployments. An additional option would be to utilize OTEL, but IMO I think we should have a mechanism for simple status updates and OTEL to provide further detail depending on the level of adoption the WOS intends to implement.

The associated workflow / use case for this is detailed below:

  1. Application is deployed to the Edge Device via the deployment service
  2. Following the installation, the WOA creates the following Deployment status file.
  3. WOA sends northbound the status file to inform the WOS of the deployment status.

Proposed Margo Deployment Status Update

kind: Margo Deployment Status
metadata:
    name: hello-world-deployment
    uniqueId: #######
deploymentInformation:
    status:

Top-level Attributes

Attribute Type Required? Description
metadata Metadata Y Metadata element specifying characteristics about the deployment. See the Metadata section below.
deploymentInformation Deployment Information Y Deployment information element describing information related to the deployment of the workload. . See the Deployment Attributes section below.

Metadata Atrributes

Attribute Type Required? Description
name string Y Name of corresponding deployment created by the Workload Orchestration Software assigned to the device via the Device Orchestration Software.
uniqueID string Y Unique ID of the Deployment to ensure the Workload Orchestration Software can match the status with the deployment.

Deployment Information Attributes

Attribute Type Required? Description
status string Y Status attribute that includes a quick status indicator for the Workload Orchestration Software. Note: Still need to figure out what we want to standardize on updates in this capacity.
ajcraig commented 5 months ago

@margo/technical-wg new proposal for review on deployment status file.

phil-abb commented 5 months ago

We probably need some additional information. What about expanding it to something like this

apiVersion: deployment.margo/v1
kind: DeploymentStatus
metadata:
    deploymentId: 
    timeStamp:
status:
  state:
  error:
    code:
    message:
  components:
    - name:
      state:
      error:
        code:
        message:

Example 1

apiVersion: deployment.margo/v1
kind: DeploymentStatus
metadata:
    deploymentId: 3a5549f2-02a4-4faf-9bc4-1ea9866684c1
    timeStamp: 2024-06-10 08:37:54Z
status:
  state: installing
  components:
    - name: digitron-orchestrator
      state: installed
    - name: database-services
      state: installing

Example 2

apiVersion: deployment.margo/v1
kind: DeploymentStatus
metadata:
    deploymentId: 3a5549f2-02a4-4faf-9bc4-1ea9866684c1
    timeStamp: "2024-06-10 09:03:27Z"
status:
  state: installed
  components:
    - name: digitron-orchestrator
      state: installed
    - name: database-services
      state: installed

Example 3

apiVersion: deployment.margo/v1
kind: DeploymentStatus
metadata:
    deploymentId: 3a5549f2-02a4-4faf-9bc4-1ea9866684c1
    timeStamp: "2024-06-10 08:51:13Z"
status:
  state: failed
  components:
    - name: digitron-orchestrator
      state: installed
    - name: database-services
      state: failed
      error:
        code: InvalidArgument
        message: "failed to provision volume with StorageClass 'default': rpc error: code = InvalidArgument desc = unsupported access mode: MULTI_NODE_MULTI_WRITER"

Example 4

 apiVersion: deployment.margo/v1
kind: DeploymentStatus
metadata:
    deploymentId: 3a5549f2-02a4-4faf-9bc4-1ea9866684c1
    timeStamp: "2024-06-10 09:03:27Z"
status:
  state: failed
  error:
    code: HostUnavailable
    Error: "Unable to communicate with the Kubernetes cluster" 
  components:
    - name: digitron-orchestrator
      state: pending
    - name: database-services
      state: pending

This is the workflow I have in mind for installing new applications.

install sequence

gunjald commented 5 months ago

One question on status field below: status: state: failed error: code: HostUnavailable Error: "Unable to communicate with the Kubernetes cluster"

Specifically for app with multiple components is the "status->state" field is the aggregated state considering the status of different components? If so then we should be also defining how the individual components status can be used to arrive at single state at app level i.e. "status->state". Otherwise it will be difficult to convey the app level status to the user consistently.

phil-abb commented 5 months ago

Specifically for app with multiple components is the "status->state" field is the aggregated state considering the status of different components? If so then we should be also defining how the individual components status can be used to arrive at single state at app level i.e. "status->state". Otherwise it will be difficult to convey the app level status to the user consistently.

I'm thinking of the process in stages where each stage could potentially fail:

Pre-processing The actions that are occurring before the WOA starts trying to install any of the components. If the deployment fails for any reason at this point the overall deployment state is failed with the error information and the components deployment state is pending since they were never attempted

Processing The WOA has started installing the components and the overall state would be the state of the component it's currently processing. installing = installing or failed = failed.

Post-processing This one is tricky because it means the components have all been installed but something in the post-processing could fail. We'd have to talk about this because maybe there wouldn't be anything here that could fail (or we wouldn't want anything here that could fail) but if so what does the specification say to do? E.g., Uninstall the charts? If it reaches this point, and there are no errors, the overall state is installed.

ajcraig commented 4 months ago

This content will be moved to the Margo Interface Issue that is being produced. As we discussed, the interaction patterns from WOA to WOS will be done via a REST API instead of posting of a particular file.

ajcraig commented 2 months ago

This issue is now tied directly to the Margo Management Interface PR where the latest deployment status file can be found.

@margo/approvers - Let's consider this another Decision tracker item. This issue will be closed when the PR is merged.