wasmCloud / wasmcloud-otp

wasmCloud host runtime that leverages Elixir/OTP and Rust to provide simple, secure, distributed application development using the actor model
Apache License 2.0
228 stars 49 forks source link

[RFC] Create wasmCloud Kubernetes Operator #212

Closed alsuren closed 1 year ago

alsuren commented 3 years ago

This RFC submits for comment the proposal that we create a wasmCloud Kubernetes Operator, which is responsible for presenting kubernetes resources to the Lattice Controller, and writing status updates back to the Kubernetes Resources.

Rationale

The wasmCloud Kubernetes Operator will allow users to declare their intent around the shape of their wasmCloud applications via Kubernetes. This will enable GitOps-style workflows for continuous delivery of customer value, with reliable roll-backs.

Design Detail

This RFC builds upon https://github.com/wasmCloud/wasmcloud-otp/issues/177, which describes the Lattice Controller. The Lattice Controller enables idempotent, declarative deployment of Applications to a wasmCloud lattice. The wasmCloud Kubernetes Operator is intended as a thin shim between the declarative world of Kubernetes Resources and the declarative world of the Lattice Controller.

The wasmCloud Kubernetes Operator should read OAM Application descriptions from Kubernetes Resources, via the standard Kubernetes APIs. When it notices a change, it should post the current desired state of the Application to the Lattice Controller via NATS.

The wasmCloud Kubernetes Operator will also listen over NATS for status updates from the Lattice Controller. These will be written back to the status field of the Kubernetes Resource.

Interoperability

OAM Applications conforming to the latest spec are expected to be identified by the GVK core.oam.dev v1beta1 Application. For the initial version of the wasmCloud Kubernetes Operator, we will avoid accidental interaction with other OAM-compatible Kubernetes Operators (like KubeVela) by using our own namespace. We will use core.wasmcloud.com for Group, and WasmCloudApplication for Kind. For simplicity, we will use the same v1beta1 version, since this reflects the version of the underlying OAM spec.

The wasmCloud Kubernetes Operator will not be responsible for scheduling NATS/wasmCloud pods, so the user will need to schedule these for themselves. An example of how to do this should be included in the same repo as the wasmCloud Kubernetes Controller source code.

In the initial version of the wasmCloud Kubernetes Controller, it will not be possible to deploy a hybrid wasmCloud+non-wasmCloud application in a single Application resource. If this functionality is desired then an extension should be written to KubeVela or similar, to extract the relevant parts of the Application resource into a WasmCloudApplication, for the wasmCloud Kubernetes Controller to read. This is the approach recommended by the #kubevela channel on CNCF slack.

Implementation Considerations

It should be possible to build the wasmCloud Kubernetes Operator in a simple, stateless fashion. It will need a connection to NATS (to talk to the wasmCloud Lattice Controller over the Control Plane).

It will be receiving WasmCloudApplication Resources as JSON over HTTP, and will pass them on to the Lattice Controller as JSON over NATS. It can be built using any framework/language that supports these formats and protocols.

Building upon the work of https://github.com/wasmCloud/wasmcloud-k8s-operator/pull/2 in Go seems like a reasonable approach.

[edit 14 sept: switched from wasmcloud.com to core.wasmcloud.com to make our lives easier with kubebuilder]

autodidaddict commented 3 years ago

I am definitely in favor of not relying on any particular version of OAM for this. The underlying lattice-controller could change its manifest format from an OAM YAML file to a JSON file during a breaking change release, and we don't want the custom resource files submitted to this operator to be tightly coupled to that underlying format.

In other words, consumers of this operator should be blissfully unaware of the existence of the lattice controller.

alsuren commented 3 years ago

I am definitely in favor of not relying on any particular version of OAM for this.

I think that the choice of OAM as a general framework was a good one. My hope is that it does what we want, and that we can support multiple versions backwards-compatibly if needed, and upstream any extensions that we need to make to it.

I think that it will be valuable to reduce the impedance mismatch between wash ctl apply my-local-wasmcloud-application.yaml and kubectl apply -f my-kubernetes-wasmcloud-application.yaml.

One way to do this would be to specify the allowed OAM version[s] in the lattice-controller, and have wash and the operator only do very minimal format conversions (json<->yaml) wherever possible.

If the lattice controller makes a breaking change that causes it to not support a previously-supported OAM version, then the operator will need to make an equivalent breaking change, and drop that version from its list of supported versions in its WasmCloudApplication CRD.

The alternative would be to keep backwards compatibility by baking translation logic into the operator. If we did this, we would need to keep it in sync with translation logic in wash. I would rather not go down this route.

we don't want the custom resource files submitted to this operator to be tightly coupled to that underlying format.

A switch from yaml to json wouldn't need to cause a breaking change to kubernetes users. A switch away from OAM would need a breaking change in the operator (see above).

brooksmtownsend commented 2 years ago

Hey @alsuren this issue has been open for a while, I'd love to see if you have any additional thoughts here especially with the addition of our wadm project which implements what's described in #177.

We likely aren't prioritizing creating controllers specifically for kubernetes and would love to collaborate on a deployment manifest that runs wadm (which has a container) alongside wasmcloud and NATS inside a kubernetes cluster for those who are looking to get started

brooksmtownsend commented 1 year ago

This work will be done with wadm and the helm chart rather than a concrete Kubernetes operator. If folks are interested in taking a stab at implementing this operator, we'll always be happy to give advice and support 😄