Avoiding external registry access by WOA

g0zilla commented 3 months ago

In our preliminary draft, the installation of apps is initiated by the WoA. Currently, the sequence is roughly as follows: First, the WoA pulls a new app state including the margo.yaml, then it pulls the app images from the application vendor registry to install the app.

sequenceDiagram
    participant registry as Application Vendor registry 
    actor Service Engineer
    participant WOS
    participant WOA
    Service Engineer ->> WOS: Install App
    WOS ->> WOS: Publishes desired stat with encrypted registry credentials
    WOA ->> WOS: Sees state has changed
    activate WOA
    WOS -->> WOA: Pull state
    deactivate WOA
    note over WOA, registry:   External Access
    WOA ->> registry: Pull charts/images
    activate registry
    registry -->> WOA: download
    deactivate registry

I see a problem on accessing external third-party app registries from a WOA running in an OT network! Strict firewall rules typical prohibit such external access via the internet. Furthermore, remote pulls increases the internet traffic. What if there are dozens or hundreds of edge devices?

I propose (as an alternative choice) that the app retrieval is transparent for the WOA and that it is completely in the responsibility of the WOS. In consequence, the WOS takes care, that the app is transported to the WOA. I.m.o., there are two ways to achieve this: (1) mirror the images and helm charts in a local registry/repo or (2) introduce a package format, where all app artifacts (margo.yaml, helm charts/docker-compose-yaml and app images as tar) are included. I prefer the second option as I see less complexity in this. Here, the app package would be directly downloaded from the WOS, e.g. as a tar file per HTTP GET.

sequenceDiagram
    actor Service Engineer
    Service Engineer->>WOS: Onboard edge app
    WOS ->> WOS: Compile app package
    WOS ->> WOS: Show app in app catalog
    Service Engineer ->> WOS: Initiate app installation from app catalog
    WOS ->> WOA: Propagate install command (e.g. by push)
    WOA ->> WOS: Request app package (e.g. per HTTP GET)
    activate WOS
    WOS ->> WOA: Download app package
    deactivate WOS
    WOA ->> WOA: Unpack app package
    WOA ->> WOA: Load images to local registry
    WOA ->> WOA: Start app

The benefits of the approach are the following:

No external access for the WOA edge devices is necessary. The WOS could be in mixed IT and OT network. WOA remains in the more restricted OT network.
The approach supports air gapped scenarios. There, the app has to be “somehow” onboarded to the WOS, but from there, it can be deployed on the WOA easily, since WOS and WOA can communicate also in an air gapped scenario.
Rolling out an app to a fleet of edge devices requires no capacity on the internet uplink.
Less dependency of availability of app sources. Once the app is onboarded on the WOS, it is available forever.

Of course, there are also drawbacks:

The introduction of an app package concept changes the typical workflow of helm chart deployment (the same is true for docker-compose). Now, the location, from where container images are loaded, is not specified by the helm chart owner anymore. Therefore, the image key has to be refactored. Image hashes have to be either removed (not a problem because of the trust relationship between WOS and WOA) or refactored as well.
The app developer gives up control of the deployment of the app. Licensing has to addressed on another level accordingly.

I wouldn’t say that that the initial approach is bad as there are valid reasons to go this way. But I do see that there are short comings, which have to be circumvented. I would be really interested what you think about this new concept. Especially, how you see the pros and cons.

Best regards, Andreas

phil-abb commented 3 months ago

Making use of thick bundles does make some things easier and would be ideal for an air-gapped solution. @ajcraig raised a good point when we were talking about using thick bundles because it would mean the WOS has to incur the ingress cost for receiving the files from the app vendor and the egress cost for the devices pulling them from the WOS. If the WOS is local it's not an issue but if the WOS is cloud-based this could be a major cost concern for WOS vendors.

Allowing the end-user to host the helm charts and docker image in an OCI registry they control also makes sense because I agree, chances are most end-users are going to want to have some control over where these files come from.

I see two ways this can be accomplished:

The first is what you mentioned where the helm chart would need to be packaged in a way that allows the registry for the docker images and dependent helm charts to be specified at deployment time and not baked into the helm chart. This would mean as part of Margo we have to require helm charts and docker-compose files to be created in a specific way to enable this.

The other option is to allow devices to configure mirrors with rewriting. I've done this with K3s so I didn't have to make any changes to the helm chart itself and still pull from my mirror. I would imagine other Kubernetes distributions support something similar but I'm not sure about Docker, Podman, etc. The drawback with this though means the end-user is going to have to set this up for all of their Margo-compliant devices which may or may not be problematic depending on how many Margo-compliant devices the end user is dealing with. Of course, we could make this part of the Margo management API specification and something the device has to implement where the WOS can provide the information for the device to configure the mirror and rewrite rules. I just don't know how widely this is supported with the different container orchestration platforms.

phil-abb commented 3 months ago

@g0zilla @arne-broering @ajcraig

I did a quick check on how dependencies on other helm charts are handled and this part isn't going to be an issue because those helm charts get pulled down and are included in the help chart package that gets created.

So, for example, I have this chart.yaml file:

apiVersion: v2
name: goodbye-world
description: A Goodbye-World Helm chart for Kubernetes
type: application
version: 0.0.1
appVersion: 0.0.1
dependencies:
  - name: hello-world
    version: 0.0.1
    repository: "oci://ghcr.io/pdpresson/charts"

Whenever I package up my goodbye-world helm chart I have to first run this command which pulls down all the helm chart dependencies (so the hello-world helm chart in the example above)

helm dep up

The package the goodbye-world chart with

helm package .

Inside the packaged goodbye-world helm chart it contains the hello-world helm chart so it's not going to have to go pull down that hello-world helm chart when installing it on the device.

So this means we would need to require Margo-compliant application helm charts to have a property in their values.yaml that allows for setting the docker image registry and there must be a parameter in the margo.yaml to allow end-users to change the registry if it's not the vendor's registry. We'd also need to introduce something to allow the end-user to specify an alternate registry for the main helm chart.

Of course, the issue Erik mentioned is still there where the SHA value is changed when the files are loaded into a different OCI registry.

stormc commented 2 months ago

Of course, the issue @eriknordmark mentioned is still there where the SHA value is changed when the files are loaded into a different OCI registry.

This is a good writeup of the (root of the) problem: The Road to OCIv2 Images: What's Wrong with Tar?.

skopeo copy can copy digest-stable and the others are following suit, meanwhile. However, I don't know how much of this can be considered done by now...

margo / specification

Avoiding external registry access by WOA #20