kubeflow / notebooks

Kubeflow Notebooks lets you run web-based development environments on your Kubernetes cluster by running them inside Pods.
Apache License 2.0
14 stars 7 forks source link

Notebooks 2.0 // Controller // Implement VirtualService reconciliation #37

Open thesuperzapper opened 2 weeks ago

thesuperzapper commented 2 weeks ago

related https://github.com/kubeflow/kubeflow/issues/7156

Now that https://github.com/kubeflow/notebooks/pull/34 is merged, there are only a small number of remaining tasks for the "Kubeflow Workspaces Controller to be finished.

One task to finish the workspace controller loop, so it also reconciles the Istio VirtualServices required to make the Workspace Pods accessible on the Istio mesh

Implementation

Here is the TODO in the code:

https://github.com/kubeflow/notebooks/blob/bc4e4454bc1c921549cb9df5865e52a61fe07946/workspaces/controller/internal/controller/workspace_controller.go#L345-L348

VirtualService Definition

The structure of the HTTP paths exposed by each Workspace should be: /workspace/<NAMESPACE_NAME>/<WORKSPACE_NAME>/<PORT_ID>/

Where PORT_ID is the id of a port defined in the currently selected spec.podTemplate.options.imageConfig, which is defined in the WorkspaceKind under spec.podTemplate.options.imageConfig.values[].spec.ports

Note, this path structure is already reflected in the httpPathPrefix function for the spec.podTemplate.extraEnv[0].value go templates (which should probably be factored into a helper):

https://github.com/kubeflow/notebooks/blob/bc4e4454bc1c921549cb9df5865e52a61fe07946/workspaces/controller/internal/controller/workspace_controller.go#L594-L604

Example VirtualService

Here is a templated example of the VirtualService we should reconcile:

apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
  ## for reference, see the existing `generateNamePrefix()` usage for `Service` and `StatefulSet` generation
  name: "ws-<WORKSPACE_NAME>-<RANDOM_SUFFIX>"
  namespace: "<PROFILE_NAME>"

## NOTE: see the full VirtualService spec here:
##       https://istio.io/latest/docs/reference/config/networking/virtual-service/
spec:

  gateways:
    ## read this from an environment variable called `ISTIO_GATEWAY`, set on the controller
    ## DEFAULT: "kubeflow/kubeflow-gateway"
    - "<ISTIO_GATEWAY>"

  hosts:
    ## read this from an environment variable called `ISTIO_HOST`, set on the controller
    ## DEFAULT: "*"
    - "<ISTIO_HOST>"

  http:

    ## there should be an element for EACH of the `ports` in the Workspace currently selected `imageConfig`
    - match:
        - uri:
            prefix: "/workspace/<PROFILE_NAME>/<WORKSPACE_NAME>/<PORT_ID>/"

      ## if `httpProxy.removePathPrefix` is true, include this block. otherwise, remove it
      rewrite:
        uri: "/"

      headers:

        ## implement any `httpProxy.requestHeaders` manipulation here
        ## DOCS: https://istio.io/latest/docs/reference/config/networking/virtual-service/#Headers-HeaderOperations
        request:

          ## NOTE: the default is {}, this is just an example
          ## NOTE: this turns into "/workspace/{PROFILE_NAME}/{WORKSPACE_NAME}/{PORT_ID}/" when rendered
          ## NOTE: implement go templates the same as `httpPathPrefix` used in the `spec.podTemplate.extraEnv[].value`
          ## WARNING: remember to add a new Webhook validation for this field (and `add`) to ensure it's a valid Go template
          set:
            X-RStudio-Root-Path: |-
              {{ httpPathPrefix "PORT_ID" }}

          add: {}

          remove: []

      route:
        - destination:
            ## read CLUSTER_DOMAIN from an environment variable called `CLUSTER_DOMAIN`, set on the controller
            ## DEFAULT: "cluster.local"
            host: "<SERVICE_NAME>.<PROFILE_NAME>.svc.<CLUSTER_DOMAIN>"
            port:
              number: <PORT_NUMBER>

CRDs

Besides making the Pod available, the VirtualService will also implement the spec.podTemplate.ports[].httpProxy part of the WorkspaceKind spec:

spec:
  podTemplate:

    ## port configs (MUTABLE)
    ##
    ports:

      ## configs to apply to a specific port
      ##  - ports are identified by their `ports[].id` from the `imageConfig` spec
      ##
      - portId: "jupyterlab"

        ## httpProxy configs for the port
        ##  - note, only "HTTP" protocol ports are supported
        ##
        httpProxy:

          ## if the HTTP path prefix should be removed
          ##  - if true, the '/workspace/{PROFILE_NAME}/{WORKSPACE_NAME}/{PORT_ID}' path prefix
          ##    will be stripped from incoming requests
          ##  - that is, the application sees the request as if it was made to '/...'
          ##  - this only works if the application serves RELATIVE URLs for its assets
          ##
          removePathPrefix: false

          ## header manipulation rules for incoming HTTP requests
          ##  - sets the `spec.http[].headers.request` of the Istio VirtualService
          ##    https://istio.io/latest/docs/reference/config/networking/virtual-service/#Headers-HeaderOperations
          ##  - the following go template functions are available:
          ##      - `httpPathPrefix(portId string)`: returns the HTTP path prefix of the specified port (regardless of `removePathPrefix`)
          ##
          requestHeaders: {}
            #set: { "X-RStudio-Root-Path": "{{ httpPathPrefix 'rstudio' }}" } # example for RStudio
            #add: {}
            #remove: []

Changes from current CRDs

After designing and discussing this implementation, we figured that we need to make some small additions to the existing Notebooks 2.0 CRDs:

thesuperzapper commented 2 weeks ago

@jiridanek @Adembc, @kimwnasptd, or @ederign you might be interested in picking this up.

thesuperzapper commented 2 weeks ago

@jiridanek as discussed in the Notebook WG meeting, I have added some better examples of what the VirtualService should look like.

I have also realized that we will need to change a few things about the WorkspaceKind CRD spec, namely that the spec.podTemplate.httpProxy map needs to be replaced with spec.podTemplate.ports list, so different proxy configs can be applied to each port. (See above issue for more info).

mcruzdev commented 1 week ago

Hi @thesuperzapper could you assign it to me?

jiridanek commented 1 week ago

I'll be on vacation for the next week, so, all the best to @mcruzdev!

mcruzdev commented 1 week ago

Thank you @jiridanek, good vacaction for you :)