Discussion: Workload Orch. Agent communication pattern

The purpose of this discussion is to gather all information, have a discussion on the content, and come to a decision regarding the the communication pattern the Workload Orchestration Agent utilizes.

This specific API is needed to enable interoperable communications between the WOA and the WOS. I am envisioning the following functions enabled by the communication pattern:

WOA onboarding/enrollment to the WOS
WOA sending/posting the device capabilities file to the WOS during enrollment
Workload Orchestration agent pulling/receiving deployment specifications
Workload Orchestration agent sending/posting the deployment status

GitOps Approach Pros:

File-based interface
Existing products that utilize this approach. i.e. Flex CD and Argo CD. Utilized widely to maintain desired state within Kubernetes clusters
Robust auditability by involving the Git repository in the system. Easy to accomplish roll backs with complete history. Verification can be done prior to the placement of the desired state via CI/CD processes.

Cons:

Issues with maintaining Git repositories for many edge's and their associated applications
Newer communication strategy compared to REST based APIs
Will need to be more specific when stating Gitops approach. Needs more definition within the specification.

Margo API Approach Pros:

Query Based interface
REST standard proven technology in the industry
Scaling of API infrastructure is an easy task
More flexibility with a purpose built API, custom functionality and security. Tailored for specific queries and responses.

Cons:

Margo would be responsible for maintaining the API, schema, backwards compatibility and other
Implementation is required on both ends, WOS and WOA are both impacted

I don't think GitOps should be used for all communication. If we are going to use it, I think it only makes sense to handle communicating the desired state between the WOS and WOA. Any communication originating from the WOA should either use an API or OpenTelemetry.

Couple questions:

WOA sending/posting the device capabilities file to the WOS during enrollment Do we think this should be handled by communication between the WOA and WOS or should this be handled by the DOA? It feels like it should be the responsibility of the DOA to ensure this information is available and communicated. I think it's information the WOS needs to know about but it could technically get it from the DOS.
WOA onboarding/enrollment to the WOS Same question here. Should this be the responsibility of the DOA/DOS onboarding process or should there be a separate process for onboarding the WOA/WOS?
Workload Orchestration agent sending/posting the deployment status Previously, we talked about using OpenTelemetry to communicate the deployment status. We talked about this being a passive approach because of the nature of how OpenTelemetry works. Do we feel having a passive approach isn't good enough and we need the more immediate communication an API provides? If so, do we feel this is good enough, or should we also include this information as part of what is required in the application observability specification to make sure we're sending this data?

@margo/technical-wg Let me know your thoughts, we will be discussing this today in our Workload Orchestration Agent call.

The primary advantage of employing a GitOps approach becomes apparent when human interaction is involved. From a developer’s standpoint, the ability to interact with the edge system via the command line is beneficial as it eliminates discontinuities. However, beyond this, I don’t see many additional benefits. Versioning could be one such benefit, but this can also be achieved outside of Margo (with git) or within Margo using specific features of WOS and WOA, which likely surpass what git can offer. Git, as a tool, is certainly more convenient than calling an API, particularly when transmitting a bundle of files. However, implementers can introduce CLIs if a web UI is not preferred. We, at Margo, could also provide guidelines for such a CLI to ensure compatibility. If we transition from the command-line interface and aim to operate the edge system from a web UI, the user will likely not even realize that git is involved. I’m not convinced by the argument for alignment with CD tools either. These tools are successful because their primary use case is not cross-organizational or cross-role: you don’t use Argo as an app user, nor do you deploy your code in a cluster from another company. But in our case, this is the primary scenario! I don’t see any benefits when it comes to machine-to-machine communication (where we are currently considering git too). If only one sender transmits updates (start, stop, start again, etc.), this could easily be done with an API. If multiple senders are involved, conflict resolution becomes challenging. If traffic pass-through is an issue, for instance, due to closed ports, other API bindings based on MQTT or WebSocket could be considered. In short, I see many issues with GitOps, not to mention scalability and large files (BLOBs). Furthermore, typical workflows require web UIs, and REST is the most relevant technology in this context. Therefore, a REST API will be necessary regardless.

Git, as a tool, is certainly more convenient than calling an API, particularly when transmitting a bundle of files. However, implementers can introduce CLIs if a web UI is not preferred. We, at Margo, could also provide guidelines for such a CLI to ensure compatibility. If we transition from the command-line interface and aim to operate the edge system from a web UI, the user will likely not even realize that git is involved.

@g0zilla Can you comment further on this? The goal with Margo is automating the orchestration so I'm not clear on where a CLI fits in from your perspective.

I’m not convinced by the argument for alignment with CD tools either. These tools are successful because their primary use case is not cross-organizational or cross-role: you don’t use Argo as an app user, nor do you deploy your code in a cluster from another company. But in our case, this is the primary scenario!

I'm not sure I understand. Can you elaborate further on the differences? I don't see this as any different from what you would do for a single company. The WOS maintains a specific git repository for the device. The device is pointed to this repository by the WOS during onboarding. The device uses it to get the desired state. This seems to be the same workflow for any other place currently using GitOps to manage the desired state for their Kubernetes environment.

I don’t see any benefits when it comes to machine-to-machine communication (where we are currently considering git too). If only one sender transmits updates (start, stop, start again, etc.), this could easily be done with an API. If multiple senders are involved, conflict resolution becomes challenging. If traffic pass-through is an issue, for instance, due to closed ports, other API bindings based on MQTT or WebSocket could be considered.

Can you provide more information on where you see the possibility of multiple senders? As far as I understand the proposal the WOS would be the only source for updates to the desired state and the only way the WOS communicates down to the device using GitOps. All the other communication would originate from the device to the WOS.

Git, as a tool, is certainly more convenient than calling an API, particularly when transmitting a bundle of files. However, implementers can introduce CLIs if a web UI is not preferred. We, at Margo, could also provide guidelines for such a CLI to ensure compatibility. If we transition from the command-line interface and aim to operate the edge system from a web UI, the user will likely not even realize that git is involved.

@g0zilla Can you comment further on this? The goal with Margo is automating the orchestration so I'm not clear on where a CLI fits in from your perspective.

A CLI would be crucial when you want to interact with the WOS or WOA and without using the web UI. For example, if you need to onboard an app and what you do is either you configure everything in the GUI or you have a tool, which provides the respective functionality, e.g., margo onboard -f my-app-margo.yaml. I just highlighted a (potential) CLI because it would a versatile tool supporting developer workflows as well as a (human) user interface.

I’m not convinced by the argument for alignment with CD tools either. These tools are successful because their primary use case is not cross-organizational or cross-role: you don’t use Argo as an app user, nor do you deploy your code in a cluster from another company. But in our case, this is the primary scenario!

I'm not sure I understand. Can you elaborate further on the differences? I don't see this as any different from what you would do for a single company. The WOS maintains a specific git repository for the device. The device is pointed to this repository by the WOS during onboarding. The device uses it to get the desired state. This seems to be the same workflow for any other place currently using GitOps to manage the desired state for their Kubernetes environment.

The primary vision for Margo is to enable interoperability across various organizations and roles. These circumstances significantly impact the Margo's requirements. We have learned that simply recommending Kubernetes cannot be the solution as it runs only application in a specific infrastructure where the whole DevOps team belongs to the same organization and if you want to start/stop an app, you have to be a DevOp as well and cannot be just an service engineer on the factory floor. The implication of this is that we can indeed be inspired by certain approaches implemented by Argo & co, but we have to accept that our usage scenario is different. You are right, if the communication is limited to WOS and WOA, crossing organizational boundaries may not be the primary concern, but usability remains crucial. I think, those tools do a great job if development and rolling out are the main concerns, e.g. if a devops team work on the application as well as on the deployment specification and it continuously deploying the application in the cluster. But within Margo, the workflow will probably be that the dev team works on the app in their own environment using whatever tools they like, and when they are done, the release a bundle of artefacts, a.k.a. Margo App. This means the whole shebang with GitOps (within Margo) is reduced to sending once in a while a new app specification as a single file towards the edge devices. This sounds like cracking nuts with a sledgehammer. This would still be fine, but we need also a proper mechanism for lifecycle operations, including start, stop, uninstall. And for this, I think, GitOps shouldn't be our first choice.

I don’t see any benefits when it comes to machine-to-machine communication (where we are currently considering git too). If only one sender transmits updates (start, stop, start again, etc.), this could easily be done with an API. If multiple senders are involved, conflict resolution becomes challenging. If traffic pass-through is an issue, for instance, due to closed ports, other API bindings based on MQTT or WebSocket could be considered.

Can you provide more information on where you see the possibility of multiple senders? As far as I understand the proposal the WOS would be the only source for updates to the desired state and the only way the WOS communicates down to the device using GitOps. All the other communication would originate from the device to the WOS.

Git is a excellent tool syncing different states among various contributors. Using Git only to pull the latest commit is not a big deal. You could even use FTP for this. But, when it comes to propagating states, we need to receive acknowledgements. Since git is the interface and operates with files, you have to use files as well in order to propagate information back in a timely and reliable ammer. For this, I don't see any other way, except by introducing a REST API for confirmations. But this would be a mix, which is difficult to justify. Not to speak of debugging... If WOS and WOA are pushing to the same repo, there will be conlicts (although we figure a scheme where this supposed to be ommited).

margo / specification

Discussion: Workload Orch. Agent communication pattern #12