jupyter-server / enterprise_gateway

A lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across distributed clusters such as Apache Spark, Kubernetes and others.
https://jupyter-enterprise-gateway.readthedocs.io/en/latest/
Other
615 stars 220 forks source link

JEG 3.2.3 on K3s v1.28.5 unable to start kernels. Same actions with helm chart 3.2.2 pyspark kubernetes work. #1379

Open paf91 opened 2 months ago

paf91 commented 2 months ago

Description

Whenever I start kernel based on kubernetes I'm getting this error on version 3.2.3:

Error Starting Kernel HTTP 500: Internal Server Error (Error from Gateway: [Error occurred creating role binding for namespace 'guest-fab3e59b-edbb-4e1d-912e-087b1798425b': module 'kubernetes.client' has no attribute 'V1Subject'] Error occurred creating role binding for namespace 'guest-fab3e59b-edbb-4e1d-912e-087b1798425b': module 'kubernetes.client' has no attribute 'V1Subject'. Ensure gateway url is valid and the Gateway instance is running.)

Reproduce

values.yaml:

service:
  type: "LoadBalancer"
  # Master public IP on which to expose EG.
  k8sMasterPublicIP: '<redacted, private ip like 10.x.x.x>'
  ports:
    - name: "http"
      port: 8888
      targetPort: 8888
    - name: "http-response"
      port: 8877
      targetPort: 8877
ingress:
  enabled: false
kernel:
  shareGatewayNamespace: false
  allowedKernels:
    - r_kubernetes
    - python_kubernetes
    - python_tf_kubernetes
    - python_tf_gpu_kubernetes
    - scala_kubernetes
    - spark_r_kubernetes
    - spark_python_kubernetes
    - spark_scala_kubernetes
    - spark_python_operator
    - python3
  defaultKernelName: python_kubernetes
kip:
  enabled: true
  serviceAccountName: 'kernel-image-puller-sa'
  criSocket: /run/containerd/containerd.sock

helm upgrade --install enterprise-gateway https://github.com/jupyter-server/enterprise_gateway/releases/download/v3.2.3/jupyter_enterprise_gateway_helm-3.2.3.tar.gz --namespace enterprise-gateway -f ~/jupyter/gateway/values-balancer.yaml

kubectl get pods -n enterprise-gateway: image

Try to run:

`curl -X POST -i 'http://<redacted_private_ip>:8888/api/kernels' --data '{ "name": "spark_python_kubernetes", "env": { "KERNEL_USERNAME": "jovyan" }}'`

Response:

{"reason": "Error occurred creating role binding for namespace 'jovyan-99878812-28c5-49ad-8cbc-cb81713e7ba3': module 'kubernetes.client' has no attribute 'V1Subject'", "message": ""}

Enterprise gateway logs:

kubectl logs -n enterprise-gateway enterprise-gateway-cfbb54797-7dph8

[D 2024-04-02 23:17:01.296 EnterpriseGatewayApp] RemoteMappingKernelManager.start_kernel: spark_python_kubernetes, kernel_username: jovyan
[D 2024-04-02 23:17:01.298 EnterpriseGatewayApp] Instantiating kernel 'Spark - Python (Kubernetes Mode)' with process proxy: enterprise_gateway.services.processproxies.k8s.KubernetesProcessProxy
[D 2024-04-02 23:17:01.299 EnterpriseGatewayApp] Starting kernel (async): ['/usr/local/share/jupyter/kernels/spark_python_kubernetes/bin/run.sh', '--RemoteProcessProxy.kernel-id', '<redacted>', '--RemoteProcessProxy.port-range', '0..0', '--RemoteProcessProxy.response-address', '<redacted>:8877', '--RemoteProcessProxy.public-key', '<redacted>', '--RemoteProcessProxy.spark-context-initialization-mode', 'lazy']
[D 2024-04-02 23:17:01.299 EnterpriseGatewayApp] Launching kernel: 'Spark - Python (Kubernetes Mode)' with command: ['/usr/local/share/jupyter/kernels/spark_python_kubernetes/bin/run.sh', '--RemoteProcessProxy.kernel-id', '<redacted>', '--RemoteProcessProxy.port-range', '0..0', '--RemoteProcessProxy.response-address', '<redacted>:8877', '--RemoteProcessProxy.public-key', '<redacted>', '--RemoteProcessProxy.spark-context-initialization-mode', 'lazy']
[I 2024-04-02 23:17:01.336 EnterpriseGatewayApp] Created kernel namespace: jovyan-99878812-28c5-49ad-8cbc-cb81713e7ba3
[W 2024-04-02 23:17:01.367 EnterpriseGatewayApp] Deleted kernel namespace: jovyan-99878812-28c5-49ad-8cbc-cb81713e7ba3
[E 2024-04-02 23:17:01.367 EnterpriseGatewayApp] Error occurred creating role binding for namespace 'jovyan-99878812-28c5-49ad-8cbc-cb81713e7ba3': module 'kubernetes.client' has no attribute 'V1Subject'
[E 240402 23:17:01 web:2271] 500 POST /api/kernels (<redacted>) 83.95ms

Expected behavior

Kernel starts

Context

JEG 3.2.3 on K3s v1.28.5

welcome[bot] commented 2 months ago

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively. welcome You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:

paf91 commented 2 months ago

Okay so the culprit is the updated kubernetes pip version. EG 3.2.2 has 26.1.0 kubernetes python ver, EG 3.2.3 has 29.0.0

>>> import kubernetes; print(kubernetes.__version__)
29.0.0

I hope will find time to try to fix this

>>> name='spark'
>>> namespace='default'
>>> from kubernetes import client
>>> client.V1Subject(
...             api_group="", kind="ServiceAccount", name=service_account_name, namespace=namespace
...         )
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'kubernetes.client' has no attribute 'V1Subject'. Did you mean: 'RbacV1Subject'?
>>>

So this code cant execute:

https://github.com/jupyter-server/enterprise_gateway/blame/d01e84a2457d44d14bd6bd3335307b9d0e3b483d/enterprise_gateway/services/processproxies/k8s.py#L352

lresende commented 2 months ago

Could this probably be related to new Kubernetes Client version where these have changed?

paf91 commented 2 months ago

@lresende see my reply above

lresende commented 2 months ago

So, we should cap the kubernetes client for now I would say

merqri commented 4 weeks ago

@paf91 Greetings, I think the easiest and fastest way is to build a new image as I mentioned in https://github.com/jupyter-server/enterprise_gateway/issues/1382#issuecomment-2144831486.