Open ajcraig opened 5 months ago
@margo/technical-wg new proposal for review on deployment status file.
We probably need some additional information. What about expanding it to something like this
deploymentId
would match the uniqueId from the application deployment. I'm not sure if using uniqueId
here makes the intent clear enough that this is the same as the id for the deployment.timestamp
so the WOS can determine the message order more easilystate
would be an enum with options: pending
, installing
, installed
, failed
apiVersion: deployment.margo/v1
kind: DeploymentStatus
metadata:
deploymentId:
timeStamp:
status:
state:
error:
code:
message:
components:
- name:
state:
error:
code:
message:
Example 1
apiVersion: deployment.margo/v1
kind: DeploymentStatus
metadata:
deploymentId: 3a5549f2-02a4-4faf-9bc4-1ea9866684c1
timeStamp: 2024-06-10 08:37:54Z
status:
state: installing
components:
- name: digitron-orchestrator
state: installed
- name: database-services
state: installing
Example 2
apiVersion: deployment.margo/v1
kind: DeploymentStatus
metadata:
deploymentId: 3a5549f2-02a4-4faf-9bc4-1ea9866684c1
timeStamp: "2024-06-10 09:03:27Z"
status:
state: installed
components:
- name: digitron-orchestrator
state: installed
- name: database-services
state: installed
Example 3
apiVersion: deployment.margo/v1
kind: DeploymentStatus
metadata:
deploymentId: 3a5549f2-02a4-4faf-9bc4-1ea9866684c1
timeStamp: "2024-06-10 08:51:13Z"
status:
state: failed
components:
- name: digitron-orchestrator
state: installed
- name: database-services
state: failed
error:
code: InvalidArgument
message: "failed to provision volume with StorageClass 'default': rpc error: code = InvalidArgument desc = unsupported access mode: MULTI_NODE_MULTI_WRITER"
Example 4
apiVersion: deployment.margo/v1
kind: DeploymentStatus
metadata:
deploymentId: 3a5549f2-02a4-4faf-9bc4-1ea9866684c1
timeStamp: "2024-06-10 09:03:27Z"
status:
state: failed
error:
code: HostUnavailable
Error: "Unable to communicate with the Kubernetes cluster"
components:
- name: digitron-orchestrator
state: pending
- name: database-services
state: pending
This is the workflow I have in mind for installing new applications.
One question on status field below: status: state: failed error: code: HostUnavailable Error: "Unable to communicate with the Kubernetes cluster"
Specifically for app with multiple components is the "status->state" field is the aggregated state considering the status of different components? If so then we should be also defining how the individual components status can be used to arrive at single state at app level i.e. "status->state". Otherwise it will be difficult to convey the app level status to the user consistently.
Specifically for app with multiple components is the "status->state" field is the aggregated state considering the status of different components? If so then we should be also defining how the individual components status can be used to arrive at single state at app level i.e. "status->state". Otherwise it will be difficult to convey the app level status to the user consistently.
I'm thinking of the process in stages where each stage could potentially fail:
Pre-processing
The actions that are occurring before the WOA starts trying to install any of the components. If the deployment fails for any reason at this point the overall deployment state is failed
with the error information and the components deployment state is pending
since they were never attempted
Processing
The WOA has started installing the components and the overall state would be the state of the component it's currently processing. installing
= installing
or failed = failed
.
Post-processing
This one is tricky because it means the components have all been installed but something in the post-processing could fail. We'd have to talk about this because maybe there wouldn't be anything here that could fail (or we wouldn't want anything here that could fail) but if so what does the specification say to do? E.g., Uninstall the charts?
If it reaches this point, and there are no errors, the overall state is installed
.
This content will be moved to the Margo Interface Issue that is being produced. As we discussed, the interaction patterns from WOA to WOS will be done via a REST API instead of posting of a particular file.
This issue is now tied directly to the Margo Management Interface PR where the latest deployment status file can be found.
@margo/approvers - Let's consider this another Decision tracker item. This issue will be closed when the PR is merged.
Below, I have outlined a proposal for the Deployment Status Update File, that is utilized by the Workload Orchestration Agent to inform the WOS of the status per deployment.
Note: This proposal is one option to inform the WOS the status of the deployments. An additional option would be to utilize OTEL, but IMO I think we should have a mechanism for simple status updates and OTEL to provide further detail depending on the level of adoption the WOS intends to implement.
The associated workflow / use case for this is detailed below:
Proposed Margo Deployment Status Update
Top-level Attributes
Metadata Atrributes
Deployment Information Attributes