jupyter-server / enterprise_gateway

A lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across distributed clusters such as Apache Spark, Kubernetes and others.
623 stars 222 forks source link

[JEG vanilla on K8S]: Service token file does not exists error in jupyterhub #785

Closed georghildebrand closed 4 years ago

georghildebrand commented 4 years ago


thank you all for this amazing work. I took some time today to try it out on a test kubernetes cluster (not supporting helm).

I basically went though the k8s docs and created a deployment (all fine so far). The enterprise gatewaypod is now running. in my jupyterhub notebook server i created the kernel.json and script folder. The new kernel shows up in my notebooks server. When i started i get the following error:

[D 2020-02-28 17:11:08.206 SingleUserLabApp log:174] 304 GET /user/ghildebrand/nbextensions/jupyter_dashboards/notebook/dashboard-view/view-menu.html?v=20200228123557 (ghildebrand@::ffff: 2.37ms
Traceback (most recent call last):
  File "/opt/conda/share/jupyter/kernels/python_kubernetes/scripts/launch_kubernetes.py", line 105, in <module>
    launch_kubernetes_kernel(kernel_id, response_addr, spark_context_init_mode)
  File "/opt/conda/share/jupyter/kernels/python_kubernetes/scripts/launch_kubernetes.py", line 32, in launch_kubernetes_kernel
  File "/opt/conda/lib/python3.7/site-packages/kubernetes/config/incluster_config.py", line 96, in load_incluster_config
  File "/opt/conda/lib/python3.7/site-packages/kubernetes/config/incluster_config.py", line 47, in load_and_set
  File "/opt/conda/lib/python3.7/site-packages/kubernetes/config/incluster_config.py", line 64, in _load_config
    raise ConfigException("Service token file does not exists.")
kubernetes.config.config_exception.ConfigException: Service token file does not exists.

I thought the notebooks server does not need operator permission or so?? For sure i am mixing up something. Any hint welcome.


lresende commented 4 years ago

There have been some changes around this area in the Notebook that might just now being propagated around Hub and causing this, but I will have to test it further to see if it's really a side effect of that. I will update here with our findings.

georghildebrand commented 4 years ago

@lresende thanks for having a look.

This is my JEG deployment.yaml and some notes:

# This file defines the Kubernetes objects necessary for Enterprise Gateway to run within Kubernetes.
apiVersion: v1
kind: Namespace
  name: enterprise-gateway
    app: enterprise-gateway
apiVersion: v1
kind: ServiceAccount
  name: enterprise-gateway-sa
  namespace: enterprise-gateway
    app: enterprise-gateway
    component: enterprise-gateway
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
  name: enterprise-gateway-controller
    app: enterprise-gateway
    component: enterprise-gateway
  - apiGroups: [""]
    resources: ["pods", "namespaces", "services", "configmaps", "secrets", "persistentvolumes", "persistentvolumeclaims"]
    verbs: ["get", "watch", "list", "create", "delete"]
  - apiGroups: ["rbac.authorization.k8s.io"]
    resources: ["rolebindings"]
    verbs: ["get", "list", "create", "delete"]
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
  # Referenced by EG_KERNEL_CLUSTER_ROLE below
  name: kernel-controller
    app: enterprise-gateway
    component: kernel
  - apiGroups: [""]
    resources: ["pods"]
    verbs: ["get", "watch", "list", "create", "delete"]
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
  name: enterprise-gateway-controller
    app: enterprise-gateway
    component: enterprise-gateway
  - kind: ServiceAccount
    name: enterprise-gateway-sa
    namespace: enterprise-gateway
  kind: ClusterRole
  name: enterprise-gateway-controller
  apiGroup: rbac.authorization.k8s.io
apiVersion: v1
kind: Service
    app: enterprise-gateway
    component: enterprise-gateway
  name: enterprise-gateway
  namespace: enterprise-gateway
  - name: gateway-port
    port: 8888
    targetPort: 8888
    gateway-selector: enterprise-gateway
  sessionAffinity: ClientIP
  type: NodePort
# Uncomment in order to use <k8s-master>:8888
#  externalIPs:
#  - k8s-master-public-ip
apiVersion: apps/v1
kind: Deployment
  name: enterprise-gateway
  namespace: enterprise-gateway
    gateway-selector: enterprise-gateway
    app: enterprise-gateway
    component: enterprise-gateway
# Uncomment/Update to deploy multiple replicas of EG
#  replicas: 1
      gateway-selector: enterprise-gateway
        gateway-selector: enterprise-gateway
        app: enterprise-gateway
        component: enterprise-gateway
      # Created above.
      serviceAccountName: enterprise-gateway-sa
      - env:
        - name: EG_PORT
          value: "8888"

          # Created above.
        - name: EG_NAMESPACE
          value: "enterprise-gateway"

          # Created above.  Used if no KERNEL_NAMESPACE is provided by client.
        - name: EG_KERNEL_CLUSTER_ROLE
          value: "kernel-controller"

          # All kernels reside in the EG namespace if True, otherwise KERNEL_NAMESPACE
          # must be provided or one will be created for each kernel.
        - name: EG_SHARED_NAMESPACE
          value: "False"

          # NOTE: This requires appropriate volume mounts to make notebook dir accessible
        - name: EG_MIRROR_WORKING_DIRS
          value: "False"

          # Current idle timeout is 1 hour.
        - name: EG_CULL_IDLE_TIMEOUT
          value: "3600"

        - name: EG_LOG_LEVEL
          value: "DEBUG"

          value: "60"

        - name: EG_KERNEL_WHITELIST
          value: "['r_kubernetes','python_kubernetes','python_tf_kubernetes','python_tf_gpu_kubernetes','scala_kubernetes','spark_r_kubernetes','spark_python_kubernetes','spark_scala_kubernetes']"

        # Ensure the following VERSION tag is updated to the version of Enterprise Gateway you wish to run
        image: elyra/enterprise-gateway:dev
        # Use IfNotPresent policy so that dev-based systems don't automatically
        # update. This provides more control.  Since formal tags will be release-specific
        # this policy should be sufficient for them as well.
        imagePullPolicy: IfNotPresent
        name: enterprise-gateway
            cpu: "2000m"
            memory: "4Gi"
            cpu: "2000m"
            memory: "4Gi"
        - containerPort: 8888
          name: gateway-port
          protocol: TCP
## Uncomment to enable NFS-mounted kernelspecs
#        volumeMounts:
#        - name: kernelspecs
#          mountPath: "/usr/local/share/jupyter/kernels"
#      volumes:
#      - name: kernelspecs
#        nfs:
#          server: <internal-ip-of-nfs-server>
#          path: "/usr/local/share/jupyter/kernels"
# apiVersion: apps/v1
# kind: DaemonSet
# metadata:
#   name: kernel-image-puller
#   namespace: enterprise-gateway
# spec:
#   selector:
#     matchLabels:
#       name: kernel-image-puller 
#   template:
#     metadata:
#       labels:
#         name: kernel-image-puller 
#         app: enterprise-gateway
#         component: kernel-image-puller
#     spec:
#       containers:
#       - name: kernel-image-puller 
#         image: elyra/kernel-image-puller:dev
#         env:
#           - name: KIP_GATEWAY_HOST
#             value: "http://enterprise-gateway.enterprise-gateway:8888"
#           - name: KIP_INTERVAL
#             value: "300"
#           - name: KIP_PULL_POLICY
#             value: "IfNotPresent"
#         volumeMounts:
#           - name: dockersock
#             mountPath: "/var/run/docker.sock"
#       volumes:
#       - name: dockersock
#         hostPath:
#           path: /var/run/docker.sock
georghildebrand commented 4 years ago

This issue can be closed, i realized that i had to use different env vars for connecting to the kernel. However, i don't know why it was trying to use tokens ...

kevin-bates commented 4 years ago

Thanks for working through this @georghildebrand. I wanted to respond to the EG_RESPONSE_IP question.

EG_RESPONSE_IP only applies to the interactions between EG and the launched kernel pod. Notebook doesn't come into play here. This environment variable is set prior to starting EG in cases where EG and the cluster that its launching kernels against has some kind of firewall or the specific local IP is not appropriate when used from the cluster on which the kernel lands. It is rarely used.

This value is used when constructing the EG_RESPONSE_ADDRESS environment variable. The EG_RESPONSE_IP is prepended to a port that EG listens on immediately following the kernel's launch. If EG_RESPONSE_IP is None, the EG server's local IP is used. The EG_RESPONSE_ADDRESS is conveyed to the launched kernel via the environment for containerized kernel launches, or as an argument to the kernel launcher for non-containerized launches. Its this response address to which the launched kernel sends its ZMQ port information, etc. EG then "connects" its kernel manager to these returned ports and steps out of the way, letting EG serve as a proxy between Notebook and the remote kernel.

I hope that helps.

georghildebrand commented 4 years ago

@kevin-bates thanks for clarification! much appreciated.

lucabem commented 4 years ago

Hi @georghildebrand! - How did you solved it? Sometimes I get the same issue

georghildebrand commented 4 years ago

@lucabem I used mainly the above mentioned env vars. I think as the lib uses k8s client if these are not present correctly it tries token based auth or so. Sadly I'm on mobile only otherwise I would post my Manifest that worked out