margo / specification

Margo Specification
https://specification.margo.org/
Other
16 stars 4 forks source link

Discussion: Workload Orch. Agent communication pattern #12

Open ajcraig opened 1 week ago

ajcraig commented 1 week ago

The purpose of this discussion is to gather all information, have a discussion on the content, and come to a decision regarding the the communication pattern the Workload Orchestration Agent utilizes.

This specific API is needed to enable interoperable communications between the WOA and the WOS. I am envisioning the following functions enabled by the communication pattern:

GitOps Approach Pros:

Cons:

Margo API Approach Pros:

Cons:

phil-abb commented 1 week ago

I don't think GitOps should be used for all communication. If we are going to use it, I think it only makes sense to handle communicating the desired state between the WOS and WOA. Any communication originating from the WOA should either use an API or OpenTelemetry.

Couple questions:

ajcraig commented 1 week ago

@margo/technical-wg Let me know your thoughts, we will be discussing this today in our Workload Orchestration Agent call.

g0zilla commented 1 week ago

The primary advantage of employing a GitOps approach becomes apparent when human interaction is involved. From a developer’s standpoint, the ability to interact with the edge system via the command line is beneficial as it eliminates discontinuities. However, beyond this, I don’t see many additional benefits. Versioning could be one such benefit, but this can also be achieved outside of Margo (with git) or within Margo using specific features of WOS and WOA, which likely surpass what git can offer. Git, as a tool, is certainly more convenient than calling an API, particularly when transmitting a bundle of files. However, implementers can introduce CLIs if a web UI is not preferred. We, at Margo, could also provide guidelines for such a CLI to ensure compatibility. If we transition from the command-line interface and aim to operate the edge system from a web UI, the user will likely not even realize that git is involved. I’m not convinced by the argument for alignment with CD tools either. These tools are successful because their primary use case is not cross-organizational or cross-role: you don’t use Argo as an app user, nor do you deploy your code in a cluster from another company. But in our case, this is the primary scenario! I don’t see any benefits when it comes to machine-to-machine communication (where we are currently considering git too). If only one sender transmits updates (start, stop, start again, etc.), this could easily be done with an API. If multiple senders are involved, conflict resolution becomes challenging. If traffic pass-through is an issue, for instance, due to closed ports, other API bindings based on MQTT or WebSocket could be considered. In short, I see many issues with GitOps, not to mention scalability and large files (BLOBs). Furthermore, typical workflows require web UIs, and REST is the most relevant technology in this context. Therefore, a REST API will be necessary regardless.

phil-abb commented 1 week ago

Git, as a tool, is certainly more convenient than calling an API, particularly when transmitting a bundle of files. However, implementers can introduce CLIs if a web UI is not preferred. We, at Margo, could also provide guidelines for such a CLI to ensure compatibility. If we transition from the command-line interface and aim to operate the edge system from a web UI, the user will likely not even realize that git is involved.

@g0zilla Can you comment further on this? The goal with Margo is automating the orchestration so I'm not clear on where a CLI fits in from your perspective.

I’m not convinced by the argument for alignment with CD tools either. These tools are successful because their primary use case is not cross-organizational or cross-role: you don’t use Argo as an app user, nor do you deploy your code in a cluster from another company. But in our case, this is the primary scenario!

I'm not sure I understand. Can you elaborate further on the differences? I don't see this as any different from what you would do for a single company. The WOS maintains a specific git repository for the device. The device is pointed to this repository by the WOS during onboarding. The device uses it to get the desired state. This seems to be the same workflow for any other place currently using GitOps to manage the desired state for their Kubernetes environment.

I don’t see any benefits when it comes to machine-to-machine communication (where we are currently considering git too). If only one sender transmits updates (start, stop, start again, etc.), this could easily be done with an API. If multiple senders are involved, conflict resolution becomes challenging. If traffic pass-through is an issue, for instance, due to closed ports, other API bindings based on MQTT or WebSocket could be considered.

Can you provide more information on where you see the possibility of multiple senders? As far as I understand the proposal the WOS would be the only source for updates to the desired state and the only way the WOS communicates down to the device using GitOps. All the other communication would originate from the device to the WOS.

g0zilla commented 1 week ago

Git, as a tool, is certainly more convenient than calling an API, particularly when transmitting a bundle of files. However, implementers can introduce CLIs if a web UI is not preferred. We, at Margo, could also provide guidelines for such a CLI to ensure compatibility. If we transition from the command-line interface and aim to operate the edge system from a web UI, the user will likely not even realize that git is involved.

@g0zilla Can you comment further on this? The goal with Margo is automating the orchestration so I'm not clear on where a CLI fits in from your perspective.

A CLI would be crucial when you want to interact with the WOS or WOA and without using the web UI. For example, if you need to onboard an app and what you do is either you configure everything in the GUI or you have a tool, which provides the respective functionality, e.g., margo onboard -f my-app-margo.yaml. I just highlighted a (potential) CLI because it would a versatile tool supporting developer workflows as well as a (human) user interface.

I’m not convinced by the argument for alignment with CD tools either. These tools are successful because their primary use case is not cross-organizational or cross-role: you don’t use Argo as an app user, nor do you deploy your code in a cluster from another company. But in our case, this is the primary scenario!

I'm not sure I understand. Can you elaborate further on the differences? I don't see this as any different from what you would do for a single company. The WOS maintains a specific git repository for the device. The device is pointed to this repository by the WOS during onboarding. The device uses it to get the desired state. This seems to be the same workflow for any other place currently using GitOps to manage the desired state for their Kubernetes environment.

The primary vision for Margo is to enable interoperability across various organizations and roles. These circumstances significantly impact the Margo's requirements. We have learned that simply recommending Kubernetes cannot be the solution as it runs only application in a specific infrastructure where the whole DevOps team belongs to the same organization and if you want to start/stop an app, you have to be a DevOp as well and cannot be just an service engineer on the factory floor. The implication of this is that we can indeed be inspired by certain approaches implemented by Argo & co, but we have to accept that our usage scenario is different. You are right, if the communication is limited to WOS and WOA, crossing organizational boundaries may not be the primary concern, but usability remains crucial. I think, those tools do a great job if development and rolling out are the main concerns, e.g. if a devops team work on the application as well as on the deployment specification and it continuously deploying the application in the cluster. But within Margo, the workflow will probably be that the dev team works on the app in their own environment using whatever tools they like, and when they are done, the release a bundle of artefacts, a.k.a. Margo App. This means the whole shebang with GitOps (within Margo) is reduced to sending once in a while a new app specification as a single file towards the edge devices. This sounds like cracking nuts with a sledgehammer. This would still be fine, but we need also a proper mechanism for lifecycle operations, including start, stop, uninstall. And for this, I think, GitOps shouldn't be our first choice.

I don’t see any benefits when it comes to machine-to-machine communication (where we are currently considering git too). If only one sender transmits updates (start, stop, start again, etc.), this could easily be done with an API. If multiple senders are involved, conflict resolution becomes challenging. If traffic pass-through is an issue, for instance, due to closed ports, other API bindings based on MQTT or WebSocket could be considered.

Can you provide more information on where you see the possibility of multiple senders? As far as I understand the proposal the WOS would be the only source for updates to the desired state and the only way the WOS communicates down to the device using GitOps. All the other communication would originate from the device to the WOS.

Git is a excellent tool syncing different states among various contributors. Using Git only to pull the latest commit is not a big deal. You could even use FTP for this. But, when it comes to propagating states, we need to receive acknowledgements. Since git is the interface and operates with files, you have to use files as well in order to propagate information back in a timely and reliable ammer. For this, I don't see any other way, except by introducing a REST API for confirmations. But this would be a mix, which is difficult to justify. Not to speak of debugging... If WOS and WOA are pushing to the same repo, there will be conlicts (although we figure a scheme where this supposed to be ommited).