opendatahub-io-contrib / jupyterhub-odh

Example JupyterHub deployment using OpenShift OAuth authenticator.
16 stars 31 forks source link

add trusted-ca configmap support for Kubernetes and Openshift 4.x #137

Open shalberd opened 2 years ago

shalberd commented 2 years ago

Is your feature request related to a problem? Please describe. Currently, when companies use repositories, regardless of whether for Docker or for machine learning packages or for pipeline extensions (python, R), connecting to such servers is not possible when the use SSL certificates that were generated by a custom, internal PKI. This is due to those SSL certificates not being publicly trusted. There are ways in openshift to define trusted CAs at a cluster-level and to auto-inject them into a configmap at the namespace-level.

https://docs.openshift.com/container-platform/4.8/networking/configuring-a-custom-pki.html#certificate-injection-using-operators_configuring-a-custom-pki

Describe the solution you'd like Auto-inject configmap containing trusted-bundle CAs should be added to namespace. Map / pem certs should then be mounted to a spawned jupyter lab container, be it Elyra or others. proposed location /opt/app-root/etc/jupyter/custom/certs

configmap example from another project

https://github.com/trevorbox/reloader-operator/blob/f07d1858825cc8515f45c2cf03b84c23e994aa7e/helm/app/templates/configmap-trusted-cabundle.yaml

and mounting it into a location in the spawned container that e.g. python then uses for request.get trust later

https://github.com/trevorbox/reloader-operator/blob/f07d1858825cc8515f45c2cf03b84c23e994aa7e/helm/app/templates/app-nginx-echo-headers.yaml#L50

Describe alternatives you've considered PKI internal CA trust support is needed, regardless which clients (python, curl, others) are accessing urls of servers containing resources.

Additional context See e.g. Elyra Ticket conceptual notes

https://github.com/elyra-ai/elyra/issues/2797

there, the question came up in the context of Airflow pipeline components

and also air-gapped elyra https://github.com/elyra-ai/elyra/issues/2812

The non-publicly-trusted CA-bundle-issue is also touched on here https://github.com/opendatahub-io/odh-manifests/issues/575#issuecomment-1167486544

shalberd commented 2 years ago

@LaVLaS I have been in contact with @romeokienzler regarding this. Is there any chance we can get this into v 1.3.0 of ODH?

Please see my comment in https://github.com/elyra-ai/elyra/issues/2797#issuecomment-1170824741 and the places I think such a trusted-ca configmap could be mounted at in the filesystem of the container. No need for a PVC, as a configmap holds the trusted-ca info.

Depeding on your contribution guidelines, I might also be able to help by branching and making a PR myself. The maintainers of this project could then review.

This needs to be present regardless of the move, whenever that might be, to KF Notebook controller ... as it is enterprise-level readiness and critical in disconnected / airgapped environments such as biotech, banking, insurance etc.

romeokienzler commented 2 years ago

+1 ;)

LaVLaS commented 2 years ago

The best option for right now is to extend the apply_pod_profile functionality by overriding the c.OpenShiftSpawner.modify_pod_hook. See odh-jupyterhub (documentation](https://github.com/opendatahub-io/jupyterhub-singleuser-profiles/blob/master/docs/howtouse.md)

The jupyterhub_config.py property in jupyterhub-cfg configMap will allow you to provide inline python code that will be appended to the builtin .jupyter/jupyterhub_config.py

Here is an example:

spawner = c.OpenShiftSpawner

def custom_apply_pod_profile(spawner, pod):
  """
  Example function for overriding JupyterHub server functionality to modify the user notebook Pod spec

  Should only be called via a function referenced by spawner.modify_pod_hook
  See https://jupyterhub-kubespawner.readthedocs.io/en/latest/spawner.html
  """
  # Apply profile from singleuser-profiles. REQUIRED since we want to extend the current pod spec
  # configs supported by the JH server
  apply_pod_profile(spawner, pod)

  # Pod container zero is the 'notebook' container with the env var we need to check
  nb_container_env = pod.spec.containers[0].env

  # Example EnvVar name that is included in the notebook spawn
  extra_mount_point_var_name = 'EXTRA_MOUNT_POINT_NAME'

  for item in nb_container_env:
    if item.name == extra_mount_point_var_name:
      # <CODE TO ADD ADDITIONAL MOUNT POINTS TO THE NOTEBOOK POD>
      # Modify the pod object according to the V1PodSpec
      # https://github.com/kubernetes-client/python/blob/master/kubernetes/docs/V1PodSpec.md

  return pod
spawner.modify_pod_hook = custom_apply_pod_profile

Since we do not have an environment to reproduce the issue and test functionality, I would recommend testing this in your environment to confirm it works and then we can work with you to provide an additonal jupyterhub overlay in odh-manifests that can load this custom config

shalberd commented 2 years ago

Very nice, and thank you for the pointers. Conceptually sound. I will test this next week on our environment and keep you apprised here.

OK, I was able to add to jupyterhub-cfg, temporarily setting immutable: true, part jupyterhub_config.py. It correctly got inserted at /opt/app-root/custom/jupyterhub-cfg.py

Any print statements were just for my personal debugging.

kind: ConfigMap
metadata:
  labels:
    app: jupyterhub
  name: jupyterhub-cfg
data:
  jupyterhub_config.py: |-
    spawner = c.OpenShiftSpawner

    def custom_apply_pod_profile(spawner, pod):
      """
      Example function for overriding JupyterHub server functionality to modify the user notebook Pod spec

      Should only be called via a function referenced by spawner.modify_pod_hook
      See https://jupyterhub-kubespawner.readthedocs.io/en/latest/spawner.html
      """
      # Apply profile from singleuser-profiles. REQUIRED since we want to extend the current pod spec
      # configs supported by the JH server
      apply_pod_profile(spawner, pod)

      print("custom apply pod profile ...") 
      # make pod volume definition from optional CA configmap trusted-cabundle.
      trustedCAVolume = client.V1Volume(
        name="trusted-cas-volume",
        config_map=client.V1ConfigMapVolumeSource(
          name="trusted-cabundle",
          optional=True,
          items=[client.V1KeyToPath(key="ca-bundle.crt", path="trustedcas.pem")],
        )
      )

      print("existing container volume mounts ")
      print (str(pod.spec.containers[0].volume_mounts)[1:-1])   
      newVolumesList = [trustedCAVolume] 

      if pod.spec.volumes is None:
        print("pod def has no volumes yet")
        pod.spec.volumes = newVolumesList
      else:
        print("extending pod def volumes with configmap volume")
        pod.spec.volumes.extend(newVolumesList)

      print("extending container volume mounts for ca cert configmap")
      newVolumeMount = client.V1VolumeMount(mount_path="/opt/app-root/etc/jupyter/custom/cacerts", name="trusted-cas-volume", read_only=True)
      newVolumeMountList = [newVolumeMount]

      # Inject extraVolumeMount 
      if pod.spec.containers[0].volume_mounts is None:
        print("notebook container def has no volumes mounted yet")
        pod.spec.containers[0].volume_mounts = newVolumeMountList
      else:
        print("extending existing container def volume mounts section with configmap volume mount reference")
        pod.spec.containers[0].volume_mounts.extend(newVolumeMountList)

      print("new container volume mounts ")
      print (str(pod.spec.containers[0].volume_mounts)[1:-1])   

      return pod

    spawner.modify_pod_hook = custom_apply_pod_profile 

  jupyterhub_admins: "admin"
  gpu_mode: ""
  singleuser_pvc_size: 2Gi
  notebook_destination: "$(notebook_destination)"
  culler_timeout: "31536000" # 1 year in seconds

@ptitzler

additonal spawned pod trustedCAs volume with value from configmap

Result, as wanted, the CACerts from the openshift configmap with label config.openshift.io/inject-trusted-cabundle: 'true' is mounted at /opt/app-root/etc/jupyter/custom/cacerts/trustedcas.pem and usable in the spawned container. As you can see above, I am using V1ConfigMapVolumeSource, which can only be referenced by name, not by labelSelectors, but that is ok; the custom spawner code works as intended.

The referenced configmap I created is defined like this.

kind: ConfigMap
apiVersion: v1
metadata:
  name: trusted-cabundle
  labels:
    config.openshift.io/inject-trusted-cabundle: 'true'

After creation, a ca-bundle.crt key section is added under data, containing all the custom trusted CAs / additonal CAs from the central openshift config. Openshift system admins need to make sure that root CAs precede intermediate CAs, but that should be default practice; at least it was on our cluster.

@LaVLaS Should the configmap not be present for one reason or another, the optional: true parameter makes sure the pod definition works, looks like this rendered in the spawned container volumes section

 - name: trusted-cas-volume
      configMap:
        name: trusted-cabundle
        items:
          - key: ca-bundle.crt
            path: trustedcas.pem
        defaultMode: 420
        optional: true

Thus, I would suggest adding this configmap to overlays.

I tested this, in the case the configmap is missing, the target directory in the spawned container just contains no content file trustedcas.pem, no errors come up in the main jupyterhub pod logs as well as the spawned container logs.

(app-root) sh-4.4$ pwd
/opt/app-root/etc/jupyter/custom/cacerts
(app-root) sh-4.4$ ls -all
total 0
drwxrwsrwx. 3 root    1000640000 59 Aug  8 09:32 .
drwxrwxr-x. 1 builder root       21 Aug  8 09:33 ..
drwxr-sr-x. 2 root    1000640000  6 Aug  8 09:32 ..2022_08_08_09_32_53.148661971
lrwxrwxrwx. 1 root    1000640000 31 Aug  8 09:32 ..data -> ..2022_08_08_09_32_53.148661971

In the case the configmap trusted-cabundle is present, the spawned container filesystem looks like this. Example, a spawned Elyra Image Container. I assume, in Jupyter, /opt/app-root/etc/jupyter/custom/ or, more generally, /etc/jupyter/custom is always present. PEM file is present in a new cacerts directory, leading to any content under /opt/app-root/etc/jupyter/custom/ being preserved, like custom.css in the case of the Elyra Jupyter Image.

(app-root) sh-4.4$ pwd
/opt/app-root/etc/jupyter/custom
(app-root) sh-4.4$ ls -all
total 4
drwxrwxr-x. 1 builder root       21 Aug  8 08:41 .
drwxrwxr-x. 1 builder root       20 Jun 23 17:15 ..
drwxrwsrwx. 3 root    1000640000 81 Aug  8 08:41 cacerts
-rw-rw-r--. 1 builder root       37 Dec  9  2021 custom.css
(app-root) sh-4.4$ cd cacerts/
(app-root) sh-4.4$ ls -all
total 0
drwxrwsrwx. 3 root    1000640000 81 Aug  8 08:41 .
drwxrwxr-x. 1 builder root       21 Aug  8 08:41 ..
drwxr-sr-x. 2 root    1000640000 28 Aug  8 08:41 ..2022_08_08_08_41_44.847279428
lrwxrwxrwx. 1 root    1000640000 31 Aug  8 08:41 ..data -> ..2022_08_08_08_41_44.847279428
lrwxrwxrwx. 1 root    1000640000 21 Aug  8 08:41 trustedcas.pem -> ..data/trustedcas.pem
shalberd commented 2 years ago

@LaVLaS @ptitzler @vpavlin Any news on this? Basically, I need assistance with an optional CA bundle path ENV variable, a new configmap trusted-cabundle and custom jupyterhub-cfg configmap code in an overlay of odh-manifests. Regarding a ca_cert_bundle_path entry / CA_CERT_BUNDLE_PATH env variable provided via configmap https://github.com/opendatahub-io/odh-manifests/blob/master/jupyterhub/jupyterhub/base/jupyterhub-configmap.yaml#L8, what is your opinion on whether such an extra ENV is even needed, given that /opt/app-root/etc/jupyter/custom/cacerts seems to be universally applicable, at least for Jupyterlab.

See also my comment at https://github.com/elyra-ai/elyra/issues/2797#issuecomment-1192290490 Worked beautifully in our environment, which is typical of custom enterprise private cloud Openshift / DevOps environments with an own PKI.

ptitzler commented 2 years ago

Regarding a ca_cert_bundle_path entry / CA_CERT_BUNDLE_PATH env variable provided via configmap https://github.com/opendatahub-io/odh-manifests/blob/master/jupyterhub/jupyterhub/base/jupyterhub-configmap.yaml#L8, what is your opinion on whether such an extra ENV is even needed, given that /opt/app-root/etc/jupyter/custom/cacerts seems to be universally applicable, at least for Jupyterlab.

A hard coded default won't work because Elyra (the code making the HTTP requests) needs some sort of indication whether to use it or not when requests.get(url, verify='path/to/somewhere') is invoked. If the value of /path/to/somewhere is not valid for any reason, the requests will fail. An environment variable would provide such a hint, whether explicitly (Elyra would pass its value as a parm when calling the requests library methods if it's a custom named env variable), or implicitly (requests evaluates env variable REQUESTS_CA_BUNDLE)

shalberd commented 2 years ago

At first glance, REQUESTS_CA_BUNDLE as an env seems good. But then, that will define the trusted CAs for each request globally, which might be problematic when in some location not covered in the code, or forgotten about, a regular https site is called, with regular publicly-trusted certificates, and then validation fails.

So, I think explicitely having a custom env variable here, as you suggested, CA_CERT_BUNDLE_PATH, or maybe better TRUSTED_CA_BUNDLE_PATH, is best, as you suggest at https://github.com/elyra-ai/elyra/pull/2912/commits/1efda706336d03343c607b8de9de3b71dfab2a99#diff-3d0156c36fecf796956ced71c6f80a2fbddfdea51566878b335097ae4156c295.

shalberd commented 2 years ago

@LaVLaS @romeokienzler @vpavlin

As suggested, I have added an overlay to odh-manifests in a fork. Please review the PR. I oriented myself on the parameter storage_class and did something similar for trusted_ca_bundle_path. That makes an ENV available in spawned containers in a sample location specified in KfDef overlay params.

https://github.com/opendatahub-io/odh-manifests/pull/669