minvws / nl-kat-coordination

Repo nl-kat-coordination for minvws
European Union Public License 1.2
123 stars 55 forks source link

Containerization support for more deployment setups for the boefjes #81

Closed Lisser closed 1 year ago

Lisser commented 1 year ago

Copied from closed-source issue.

@Donnype said:


Intro

We have been talking a lot about containerized boefjes but in practice there is no proper support yet. This ticket captures that we want to finally start properly supporting (a part of) thought out use-cases. To link the right people: @underdarknl @ammar92 @noamblitz @Lisser @dekkers please comment if this list is missing important ones or should not contain specific use-cases.

Context

Types of containers

There are several types of containerization we could start implementing we have discussed in the past:

These have their own set of challenges:

Types of container registries

Moreover, each specific kind of containerization requires supporting a repository of container images to expose both in the Katalogus for information as well as the workers to pull images from:

The view of supporting multiple registries was defined as deploying the Plugin Repository around each image and use the same client everywhere:

Image Figure 1

This is easy when there is one kind of repository, and using our custom plugin repository is justified in the LXD case. However, when we are going to support OCI images as well, using a Plugin Repository becomes a lot of overhead since it is another component to deploy and we have to translate OCI images into our own artifacts.

Proposal

Priorities in containerization support

My proposal to tackle this is as follows. First and foremost, I believe that priority of support should be driven by what the KAT users need the most. But even more important to note here, is that the most intensive KAT users at this stage are the KAT developers, so their setup should remain easy to work with. I think it would be an awesome feature if we could have the following flows in KAT:

Local devs: build kat and start running a worker/katalogus locally with only the LocalRepository. Copy-paste a boefje folder into a new boefje folder and change the definition file and logic to create a new boefje. Refresh the katalogus and your boefje is available to run.

Community using Kubernetes: deploy KAT using your own helm charts in a kubernetes/cloud environment. Use either the Katalogus interface in Rocky or the env-file to add your Cloud Boefjes-Container Registry (BCR) and all relevant configurations. You immediately see all boefjes/normalizers available from this BCR. You can build and push a new boefje locally or in a CI/CD pipeline (where you set all configuration parameters such as name, consumes, produces as labels for instance), using perhaps some tooling we provide. Once pushed to the BCR and have it available immediately in the Katalogus after a refresh. (This is the idea I tried to explain to Reinoud @noamblitz.)

If as a user I want to add some developer's boefjes that are hosted on dockerhub, I would not want to deploy a whole new instance of a plugin repository to enable it. Schematically this means moving towards the following model:

Image Figure 2

This would allow us to leverage common libraries/SDK's and registry APIs without the intermediate step of a plugin API client and reduce the amount of services to deploy. I think always adding this extra step only makes sense when we decide to first cast all containerization tooling into out custom image format and standardize how we run containers as well. In practice I think it would be way too much work to support e.g. OCI registries and images by casting it in our own model when this can be done quite easily through popular SDK's. If we would still use these SDK's in the Plugin API wrapper as displayed in Figure 1, the wrapper becomes as thin as a sheet but hold a lot of complexity: deploying a whole extra service and supporting complicated containerization translations.

Therefore, I think that Figure 2 would be the way to go in the foreseeable future, and in that case the following priority makes the most sense to me:

We need to pick features out of these we do want to implement and decide when to implement them. The first three would mean full OCI support in a cloud environment using Kubernetes. The third one would also be the setup for a hybrid environment where only the boefjes/normalizers are run in a container, and would only need the last one to be implemented as well to support this. At the time of writing I am not sure how far we are from a functional LXD setup.

Important: implementing these features should not make kubernetes, LXD or FireCracker a requirement for local development in my opinion.

Overview of potentially different architectures

Assuming we want to start workers for just one type of containerization (input from @underdarknl), the schematic flow of control looks as follows:

Image

Whether or not we want to use a private PyPi registry or custom FireCracker VMs wrapped in a Plugin Repository is open to discussion.

Note on extensibility Katalogus

I have not discussed the implications this has for the current repository models in the Katalogus and how we would be able to share Katalogi/Plugin Repositories between instances. I think however that this should not be too complicated: a Katalogus instance could expose all public repositories through an API for instance, and another Katalogus instance could copy all repositories from this API to its own database for instance.

Conclusion

These thoughts have been the result of several (past) discussions within the team, talking to some active users of Kat and a lot of contemplation about the architecture from my side. Let me know what you think, feedback is more than welcome!

Lisser commented 1 year ago

@ammar92 said:


Thanks for sharing this @Donnype! I do recognize many of the ideas and suggestions you made.

I would like to add some remarks about what we already have, particularly about LXD and the Plugin Repository. A few months ago I continued working on an LXD PoC by @errieman and extended this to an almost fully working LXD pipeline and Plugin Repository:

(see also https://github.com/minvws/nl-rt-tim-abang/issues/378, https://github.com/minvws/nl-rt-tim-abang/issues/436, https://github.com/minvws/nl-rt-tim-abang/issues/392, https://github.com/minvws/nl-rt-tim-abang/issues/391)

By the way, LXD doesn't strictly need a (custom) plugin repository, the plugin repository is merely an abstract registry that could hold images and code for a variety of image types (e.g. LXD, docker, and even source code, python packages or binaries).


I very much like your overview of the potentially different architectures. It seems like the correct way to go with the support for different runtimes as workers.

The only thing that still bothers me a lot is the current way we have the 'local' plugins. Sooner or later the current implementation will reach its limit due to e.g.:

Some isolation might be achieved using e.g.:

Of course, this is all temporary, since the end goal eventually is to containerize the plugins, both local and remote plugins.

Lisser commented 1 year ago

@Donnype said:


I realised that one assumption in my overview is that we want to support OCI images, but I think this will be the easiest way to facilitate a plethora of community boefjes due to its popularity. This would mean extending to other languages due to the standardised entrypoint specification (where using LXD for other languages right now would mean that we have to support different build tools and add entrypoint support etc. ourselves).

@ammar92 I can indeed see the issues there, but I see the local setup mostly as a non-production way to get started with developing KAT. I think the first two issues can be fixed with venv's by either using your TemporaryEnvironment setup and update the virtual environment, and/or start running boefjes in a subprocess. Then for the third I think we could create tooling that builds OCI images from the boefjes/normalizers, after which you can add your local Docker images as a registry?

Lisser commented 1 year ago

@dekkers said:


I agree that we should start with how users want to deploy OpenKAT. As far as I know there are three different ways people are running or told us they want to run OpenKAT and we also want to support are:

With Nomad and Kubernetes you will want to run your boefjes in the cluster and OCI images/registries work fine for that. They both also support using microVMs with Kata containers (and Kata supports both QEMU and Firecracker). And with Kubernetes you could also use gVisor.

For Debian/Ubuntu we can just support docker and/or podman to run the boefjes containers (Or maybe talk to containerd directly? Not sure about that, would need to investigate). If you want to use microVMs you can also configure containerd to use Kata as far as I can see. This will all work with OCI images and registries too.

Tthe fact that those are open standards that are in wide use give a lot of advantages:

Another benefit as Donny mentioned is that if we define the boefjes/normalizer interface as OCI image/container (e.g. how the container is started, what input it gets and what it should output) it would also be possible to implement boefjes in another programming language and such a boefje would run on every KAT installation.

Given that firecracker can be used via Kata I don't think we have a need for a Firecracker specific images / runner unless I am missing some big advantage that would result from using Firecracker directly.

For local development you can just build a new docker images with docker build or whatever tool you want to use to build your image. So I don't really understand why we would need something like LocalRunner / virtualenvs. What am I missing here?

Then there is only one last thing that doesn't support OCI and that is LXD. But to be honest I don't really understand the reasons why we would want to support LXD, because the more I look at it, the more I don't like it. First of all LXD ignores all the open standards that exist for containers and only support its own things.

But to my surprise it also doesn't seem to be completely open source. I was looking at what kind of API they have and how to do authorisation (so that the boefje running cannot do more than start/stop boefjes containers), but apparently if you want RBAC you need the proprietary Canonical RBAC service that is only available with an Ubuntu Advantage subscription: https://discuss.linuxcontainers.org/t/security-questions/8946/2 So apparently if you want to create a secure setup you need to get proprietary stuff from Canonical, so LXD seems to be more open core than open source.

I don't really see any advantage LXD brings over the other deployment options, I don't know of any potential user of OpenKAT saying they want to use LXD, but it very clear that it will be a lot of work to support it because they refuse to implement the open standards that exist.

So my proposal would be to only support OCI images and registries.


For the katalogus the boefjes and normalizers are then mostly OCI image names. It is possible to add KAT specific metadata to images, for example a normalizer could have a field that list the mimetypes it support. Docker and others implement a catalog endpoint, but not everyone does it the same, so it is not standarized unfortunately: https://github.com/opencontainers/distribution-spec/pull/45#issuecomment-521425185 So using such an interface to fetch all boefjes won't always be there.

A simply solution for discovery might be that we just implement a simple json file/endpoint that would just list all boefjes/normalizer containers. For all public boefjes we could put a generic list on github using PRs to update and actions to publish it on pages. Anyone who wants to run private boefjes from a private container registry could either create such a json file somewhere or just configure every individual boefje in the katalogus with the private container registry URL.