Open shalberd opened 2 years ago
@LaVLaS I have been in contact with @romeokienzler regarding this. Is there any chance we can get this into v 1.3.0 of ODH?
Please see my comment in https://github.com/elyra-ai/elyra/issues/2797#issuecomment-1170824741 and the places I think such a trusted-ca configmap could be mounted at in the filesystem of the container. No need for a PVC, as a configmap holds the trusted-ca info.
Depeding on your contribution guidelines, I might also be able to help by branching and making a PR myself. The maintainers of this project could then review.
This needs to be present regardless of the move, whenever that might be, to KF Notebook controller ... as it is enterprise-level readiness and critical in disconnected / airgapped environments such as biotech, banking, insurance etc.
+1 ;)
The best option for right now is to extend the apply_pod_profile
functionality by overriding the c.OpenShiftSpawner.modify_pod_hook. See odh-jupyterhub (documentation](https://github.com/opendatahub-io/jupyterhub-singleuser-profiles/blob/master/docs/howtouse.md)
The jupyterhub_config.py property in jupyterhub-cfg
configMap will allow you to provide inline python code that will be appended to the builtin .jupyter/jupyterhub_config.py
Here is an example:
spawner = c.OpenShiftSpawner
def custom_apply_pod_profile(spawner, pod):
"""
Example function for overriding JupyterHub server functionality to modify the user notebook Pod spec
Should only be called via a function referenced by spawner.modify_pod_hook
See https://jupyterhub-kubespawner.readthedocs.io/en/latest/spawner.html
"""
# Apply profile from singleuser-profiles. REQUIRED since we want to extend the current pod spec
# configs supported by the JH server
apply_pod_profile(spawner, pod)
# Pod container zero is the 'notebook' container with the env var we need to check
nb_container_env = pod.spec.containers[0].env
# Example EnvVar name that is included in the notebook spawn
extra_mount_point_var_name = 'EXTRA_MOUNT_POINT_NAME'
for item in nb_container_env:
if item.name == extra_mount_point_var_name:
# <CODE TO ADD ADDITIONAL MOUNT POINTS TO THE NOTEBOOK POD>
# Modify the pod object according to the V1PodSpec
# https://github.com/kubernetes-client/python/blob/master/kubernetes/docs/V1PodSpec.md
return pod
spawner.modify_pod_hook = custom_apply_pod_profile
Since we do not have an environment to reproduce the issue and test functionality, I would recommend testing this in your environment to confirm it works and then we can work with you to provide an additonal jupyterhub overlay in odh-manifests that can load this custom config
Very nice, and thank you for the pointers. Conceptually sound. I will test this next week on our environment and keep you apprised here.
OK, I was able to add to jupyterhub-cfg, temporarily setting immutable: true, part jupyterhub_config.py. It correctly got inserted at /opt/app-root/custom/jupyterhub-cfg.py
Any print statements were just for my personal debugging.
kind: ConfigMap
metadata:
labels:
app: jupyterhub
name: jupyterhub-cfg
data:
jupyterhub_config.py: |-
spawner = c.OpenShiftSpawner
def custom_apply_pod_profile(spawner, pod):
"""
Example function for overriding JupyterHub server functionality to modify the user notebook Pod spec
Should only be called via a function referenced by spawner.modify_pod_hook
See https://jupyterhub-kubespawner.readthedocs.io/en/latest/spawner.html
"""
# Apply profile from singleuser-profiles. REQUIRED since we want to extend the current pod spec
# configs supported by the JH server
apply_pod_profile(spawner, pod)
print("custom apply pod profile ...")
# make pod volume definition from optional CA configmap trusted-cabundle.
trustedCAVolume = client.V1Volume(
name="trusted-cas-volume",
config_map=client.V1ConfigMapVolumeSource(
name="trusted-cabundle",
optional=True,
items=[client.V1KeyToPath(key="ca-bundle.crt", path="trustedcas.pem")],
)
)
print("existing container volume mounts ")
print (str(pod.spec.containers[0].volume_mounts)[1:-1])
newVolumesList = [trustedCAVolume]
if pod.spec.volumes is None:
print("pod def has no volumes yet")
pod.spec.volumes = newVolumesList
else:
print("extending pod def volumes with configmap volume")
pod.spec.volumes.extend(newVolumesList)
print("extending container volume mounts for ca cert configmap")
newVolumeMount = client.V1VolumeMount(mount_path="/opt/app-root/etc/jupyter/custom/cacerts", name="trusted-cas-volume", read_only=True)
newVolumeMountList = [newVolumeMount]
# Inject extraVolumeMount
if pod.spec.containers[0].volume_mounts is None:
print("notebook container def has no volumes mounted yet")
pod.spec.containers[0].volume_mounts = newVolumeMountList
else:
print("extending existing container def volume mounts section with configmap volume mount reference")
pod.spec.containers[0].volume_mounts.extend(newVolumeMountList)
print("new container volume mounts ")
print (str(pod.spec.containers[0].volume_mounts)[1:-1])
return pod
spawner.modify_pod_hook = custom_apply_pod_profile
jupyterhub_admins: "admin"
gpu_mode: ""
singleuser_pvc_size: 2Gi
notebook_destination: "$(notebook_destination)"
culler_timeout: "31536000" # 1 year in seconds
@ptitzler
Result, as wanted, the CACerts from the openshift configmap with label config.openshift.io/inject-trusted-cabundle: 'true' is mounted at /opt/app-root/etc/jupyter/custom/cacerts/trustedcas.pem and usable in the spawned container. As you can see above, I am using V1ConfigMapVolumeSource, which can only be referenced by name, not by labelSelectors, but that is ok; the custom spawner code works as intended.
The referenced configmap I created is defined like this.
kind: ConfigMap
apiVersion: v1
metadata:
name: trusted-cabundle
labels:
config.openshift.io/inject-trusted-cabundle: 'true'
After creation, a ca-bundle.crt key section is added under data, containing all the custom trusted CAs / additonal CAs from the central openshift config. Openshift system admins need to make sure that root CAs precede intermediate CAs, but that should be default practice; at least it was on our cluster.
@LaVLaS Should the configmap not be present for one reason or another, the optional: true parameter makes sure the pod definition works, looks like this rendered in the spawned container volumes section
- name: trusted-cas-volume
configMap:
name: trusted-cabundle
items:
- key: ca-bundle.crt
path: trustedcas.pem
defaultMode: 420
optional: true
Thus, I would suggest adding this configmap to overlays.
I tested this, in the case the configmap is missing, the target directory in the spawned container just contains no content file trustedcas.pem, no errors come up in the main jupyterhub pod logs as well as the spawned container logs.
(app-root) sh-4.4$ pwd
/opt/app-root/etc/jupyter/custom/cacerts
(app-root) sh-4.4$ ls -all
total 0
drwxrwsrwx. 3 root 1000640000 59 Aug 8 09:32 .
drwxrwxr-x. 1 builder root 21 Aug 8 09:33 ..
drwxr-sr-x. 2 root 1000640000 6 Aug 8 09:32 ..2022_08_08_09_32_53.148661971
lrwxrwxrwx. 1 root 1000640000 31 Aug 8 09:32 ..data -> ..2022_08_08_09_32_53.148661971
In the case the configmap trusted-cabundle is present, the spawned container filesystem looks like this. Example, a spawned Elyra Image Container. I assume, in Jupyter, /opt/app-root/etc/jupyter/custom/ or, more generally, /etc/jupyter/custom is always present. PEM file is present in a new cacerts directory, leading to any content under /opt/app-root/etc/jupyter/custom/ being preserved, like custom.css in the case of the Elyra Jupyter Image.
(app-root) sh-4.4$ pwd
/opt/app-root/etc/jupyter/custom
(app-root) sh-4.4$ ls -all
total 4
drwxrwxr-x. 1 builder root 21 Aug 8 08:41 .
drwxrwxr-x. 1 builder root 20 Jun 23 17:15 ..
drwxrwsrwx. 3 root 1000640000 81 Aug 8 08:41 cacerts
-rw-rw-r--. 1 builder root 37 Dec 9 2021 custom.css
(app-root) sh-4.4$ cd cacerts/
(app-root) sh-4.4$ ls -all
total 0
drwxrwsrwx. 3 root 1000640000 81 Aug 8 08:41 .
drwxrwxr-x. 1 builder root 21 Aug 8 08:41 ..
drwxr-sr-x. 2 root 1000640000 28 Aug 8 08:41 ..2022_08_08_08_41_44.847279428
lrwxrwxrwx. 1 root 1000640000 31 Aug 8 08:41 ..data -> ..2022_08_08_08_41_44.847279428
lrwxrwxrwx. 1 root 1000640000 21 Aug 8 08:41 trustedcas.pem -> ..data/trustedcas.pem
@LaVLaS @ptitzler @vpavlin Any news on this? Basically, I need assistance with an optional CA bundle path ENV variable, a new configmap trusted-cabundle and custom jupyterhub-cfg configmap code in an overlay of odh-manifests. Regarding a ca_cert_bundle_path entry / CA_CERT_BUNDLE_PATH env variable provided via configmap https://github.com/opendatahub-io/odh-manifests/blob/master/jupyterhub/jupyterhub/base/jupyterhub-configmap.yaml#L8, what is your opinion on whether such an extra ENV is even needed, given that /opt/app-root/etc/jupyter/custom/cacerts seems to be universally applicable, at least for Jupyterlab.
See also my comment at https://github.com/elyra-ai/elyra/issues/2797#issuecomment-1192290490 Worked beautifully in our environment, which is typical of custom enterprise private cloud Openshift / DevOps environments with an own PKI.
Regarding a ca_cert_bundle_path entry / CA_CERT_BUNDLE_PATH env variable provided via configmap https://github.com/opendatahub-io/odh-manifests/blob/master/jupyterhub/jupyterhub/base/jupyterhub-configmap.yaml#L8, what is your opinion on whether such an extra ENV is even needed, given that /opt/app-root/etc/jupyter/custom/cacerts seems to be universally applicable, at least for Jupyterlab.
A hard coded default won't work because Elyra (the code making the HTTP requests) needs some sort of indication whether to use it or not when requests.get(url, verify='path/to/somewhere')
is invoked. If the value of /path/to/somewhere
is not valid for any reason, the requests will fail. An environment variable would provide such a hint, whether explicitly (Elyra would pass its value as a parm when calling the requests library methods if it's a custom named env variable), or implicitly (requests evaluates env variable REQUESTS_CA_BUNDLE
)
At first glance, REQUESTS_CA_BUNDLE as an env seems good. But then, that will define the trusted CAs for each request globally, which might be problematic when in some location not covered in the code, or forgotten about, a regular https site is called, with regular publicly-trusted certificates, and then validation fails.
So, I think explicitely having a custom env variable here, as you suggested, CA_CERT_BUNDLE_PATH, or maybe better TRUSTED_CA_BUNDLE_PATH, is best, as you suggest at https://github.com/elyra-ai/elyra/pull/2912/commits/1efda706336d03343c607b8de9de3b71dfab2a99#diff-3d0156c36fecf796956ced71c6f80a2fbddfdea51566878b335097ae4156c295.
@LaVLaS @romeokienzler @vpavlin
As suggested, I have added an overlay to odh-manifests in a fork. Please review the PR. I oriented myself on the parameter storage_class and did something similar for trusted_ca_bundle_path. That makes an ENV available in spawned containers in a sample location specified in KfDef overlay params.
Is your feature request related to a problem? Please describe. Currently, when companies use repositories, regardless of whether for Docker or for machine learning packages or for pipeline extensions (python, R), connecting to such servers is not possible when the use SSL certificates that were generated by a custom, internal PKI. This is due to those SSL certificates not being publicly trusted. There are ways in openshift to define trusted CAs at a cluster-level and to auto-inject them into a configmap at the namespace-level.
https://docs.openshift.com/container-platform/4.8/networking/configuring-a-custom-pki.html#certificate-injection-using-operators_configuring-a-custom-pki
Describe the solution you'd like Auto-inject configmap containing trusted-bundle CAs should be added to namespace. Map / pem certs should then be mounted to a spawned jupyter lab container, be it Elyra or others. proposed location /opt/app-root/etc/jupyter/custom/certs
configmap example from another project
https://github.com/trevorbox/reloader-operator/blob/f07d1858825cc8515f45c2cf03b84c23e994aa7e/helm/app/templates/configmap-trusted-cabundle.yaml
and mounting it into a location in the spawned container that e.g. python then uses for request.get trust later
https://github.com/trevorbox/reloader-operator/blob/f07d1858825cc8515f45c2cf03b84c23e994aa7e/helm/app/templates/app-nginx-echo-headers.yaml#L50
Describe alternatives you've considered PKI internal CA trust support is needed, regardless which clients (python, curl, others) are accessing urls of servers containing resources.
Additional context See e.g. Elyra Ticket conceptual notes
https://github.com/elyra-ai/elyra/issues/2797
there, the question came up in the context of Airflow pipeline components
and also air-gapped elyra https://github.com/elyra-ai/elyra/issues/2812
The non-publicly-trusted CA-bundle-issue is also touched on here https://github.com/opendatahub-io/odh-manifests/issues/575#issuecomment-1167486544