margo / specification

Margo Specification
https://specification.margo.org/
Other
26 stars 8 forks source link

Device Orchestration: Device Update Mechanism Scope discussion #49

Open ajcraig opened 1 month ago

ajcraig commented 1 month ago

These discussions were previously underway within the Discord channel but are being moved to here for further discussion.

This post is being created to finalize the scope/responsibility of this proposed focus group. Once the scope of the focus group is established, we will look to assign a lead to manage the focus group meetings and lead the content creation within the specification. Note: I plan to come back and update this main post following discussions and feedback provided below.

Proposed Focus Group: Device Update Mechanism

Proposed Scope:

ajcraig commented 1 month ago

@stormc reply This is about device update, i.e., updating a device's "firmware", right? (edited) [4:32 AM] If so, we as margo are now in the "app domain" and we shouldn't do device management -- but we should interface, meaning: From the "margo" domain, we should be able to trigger / initiate certain device management functions, like, e.g., firmware update. This requires a bi-directional communication channel between the app/margo domain and the device's management implementation. For firmware update this means, for example: (1) margo domain can query device management domain about last status of firmware notification polling (to show this in dashboards, ...) (2) margo domain can trigger looking for new firmware (3) margo domain can -- if a new firmware is to be installed -- defer the installation (to a maintenance window) ...

ajcraig commented 1 month ago

@ajcraig reply Correct, I see this as any configuration change to a device including firmware updates, network configurations, container runtime changes, BIOS updates, etc. The Device orchestration service would enable the user to kick off the update, by defining the new desired state, and then rely on private implementations to complete the deployment. Private implementations include: Device Owner/Manufacturer notification service to inform the 3rd party orchestrator that there is an update available for a device the End-User owns. Device Owner/Manufacturer firmware/file repository available to be pulled from via the device. Device Owner/Manufacturer "deployment service" residing on the device to apply the update.

I also envision a trust establishment process that is required between the End User, 3rd party device orchestrator, and Device Owner/Manufacturer services. Below is a crude drawing depicting what I described above.

I think where we differ in opinion is whether the Device Orchestration "domain" should be in scope for Margo. I was under the impression we would have both Workload and Device orchestration services that could become Margo compliant. image

ajcraig commented 1 month ago

@stormc reply I think we're quite on the same page I'd like to see an interface specified by margo that is implemented such that it calls out to existing (probably proprietary) device management functionality on the device -- this is not to be coded by margo, it's there, we "just" need to "bridge" to it. So, I'd rather see not margo implementing, e.g., applying a firmware update, but calling out to an existing firmware update agent on the device. This interface (you: device orchestration service / DOS) is part of the Margo specification and defines all device management functionality we want to trigger/consume. The implementation(s) of this interface vary and call out to the Rockwell or Siemens or ... implementation.

Just to illustrate this a bit, this is the "margo domain" with a User initiating a "Update FW" action: User: "Update FW" -> WOS -> DOS |IPC:send|

This is the "device management domain" that receives this call and does according actions with different implementation that all react to the action called above: |IPC:receive| SIEMENS Implementation -> notify |IPC:receive| Rockwell Implementation -> notify |IPC:receive| ENOTIMPLEMENTED -> notify

That said, it's not just IPC but also notification.

ajcraig commented 1 month ago

@tomcounihan reply Its an interesting system arch conversation here. My 2cents. IMHO, WOS should not have any API that knows/interacts with FW lifecycle management. As a microservice, it should only deal with App Lifecycle. I do think there is a 'missing' microservice - the orchestrator of orchestrators. Who, using the FW example, would coordinate what needs to happen. So in this instance, it might need to quiesce apps on targeted devices (perhaps migrate if that makes sense). Once it is happy that the apps are taken care of, the it goes on to the infrastructure manager (I think Device Manager same thing) , who does it magic on FW, that may be inband or out of band, But likely requiring a reboot. After reboot it reestablishes the apps (which might also need to be aware of the APP Framework might have moved on a version k8s 1.27-1.28 etc). I guess I see it in layers Orch^n (where n= number of layers) App Orch (WOS) App Framework Orch (think upgrading K8s/Docker) Infrastructure Orch (think distro upgrade, but also FW) Security Orch( keys, certs, etc) - albeit this may be a pillar sitting beside the above.

ajcraig commented 1 month ago

@stormc reply Sure, there is some intertwining between device management (system's domain) and application management (margo's domain). As you pointed out, margo probably should be able to defer a pending firmware update (think: maintenance window). For this, we need to have a mutual information exchange and some kind of (prohibiting) control exercised over the system domain, i.e., wait until the app is done manufacturing this workpiece. From a customer's perspective the device is functioning if apps are running, so apps' wishes have to be respected by the device management, at least to some degree.

ajcraig commented 1 month ago

@pauldbrooks reply Avoiding technical input, but I want to draw a distinction between in-scope and mandatory. My understanding is that device orchestration should be in scope of Margo to the extent that a device vendor and tool vendor know what they need to do to work together. But device and application orchestration are different domains - as a tool vendor I may not see device management as my concern and as a device vendor I may wish to keep device management proprietary while fully embracing open workload orchestration. This separation of domains should not get in the way of the potential to solve them both the same way; it feels (to my non-technical senses) like the workflows are the same and the differences are in the vendor-specific space rather than open interfaces

pauldbrooks commented 1 month ago

I like your distinction between in-scope and mandatory more than mine of optional and mandatory :)

stormc commented 1 month ago

@pauldbrooks: Avoiding technical input, but I want to draw a distinction between in-scope and mandatory. My understanding is that device orchestration should be in scope of Margo to the extent that a device vendor and tool vendor know what they need to do to work together. But device and application orchestration are different domains - as a tool vendor I may not see device management as my concern and as a device vendor I may wish to keep device management proprietary while fully embracing open workload orchestration.

Yes. However, the application domain and the device domain do have certain interaction points where there needs to be a defined way to pass and receive information, i.e., an interface that margo defines and that has to be implemented on both sides accordingly. For example, when triggering a firmware update by a proprietary device management backend, the device domain handling that update should ask or at least inform the application domain that it is going to install + reboot as it interrupts the device-as-a-whole availability for the update time. Same is true vice versa.

This separation of domains should not get in the way of the potential to solve them both the same way; it feels (to my non-technical senses) like the workflows are the same and the differences are in the vendor-specific space rather than open interfaces

To some extent, yes. If margo is about interoperability, it should standardize these and into this standard existing implementations can hook into. The workflows are similar but not identical, i.e., they have subtle but important differences. An update from a 10,000 ft perspective is simple – as long as as you don't have robustness, availability, security, ... requirements :)