jupyter-server / enterprise_gateway

A lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across distributed clusters such as Apache Spark, Kubernetes and others.
https://jupyter-enterprise-gateway.readthedocs.io/en/latest/
Other
621 stars 221 forks source link

Error Starting IPython kernel for Spark in Kubernetes mode #722

Open vee2jp opened 5 years ago

vee2jp commented 5 years ago

Description

When launching Spark-Python Kubernetes mode notebook, enterprise-gateway is getting Exception in thread "main" java.lang.IllegalArgumentException: basedir must be absolute: ?/.ivy2/local error, full error is below.

Screenshots / Logs

[D 2019-08-26 18:12:41.857 EnterpriseGatewayApp] Instantiating kernel 'Spark - Python (Kubernetes Mode)' with process proxy: enterprise_gateway.services.processproxies.k8s.KubernetesProcessProxy [D 2019-08-26 18:12:41.857 EnterpriseGatewayApp] Response socket launched on 'xx.yy.xx.yy:port' using 5.0s timeout [D 2019-08-26 18:12:41.857 EnterpriseGatewayApp] Starting kernel: ['/usr/local/share/jupyter/kernels/spark_python_kubernetes/bin/run.sh', '--RemoteProcessProxy.kernel-id', '956248df-391b-4bdd-89a6-ead1b0732661', '--RemoteProcessProxy.response-address', 'xx.yy.xx.yy:port', '--RemoteProcessProxy.spark-context-initialization-mode', 'lazy'] [D 2019-08-26 18:12:41.857 EnterpriseGatewayApp] Launching kernel: Spark - Python (Kubernetes Mode) with command: ['/usr/local/share/jupyter/kernels/spark_python_kubernetes/bin/run.sh', '--RemoteProcessProxy.kernel-id', '956248df-391b-4bdd-89a6-ead1b0732661', '--RemoteProcessProxy.response-address', 'xx.yy.xx.yy:port', '--RemoteProcessProxy.spark-context-initialization-mode', 'lazy'] [W 2019-08-26 18:12:41.858 EnterpriseGatewayApp] Shared namespace has been configured. All kernels will reside in EG namespace: enterprise-gateway [D 2019-08-26 18:12:41.858 EnterpriseGatewayApp] BaseProcessProxy.launch_process() env: {'LC_ALL': 'en_US.UTF-8', 'KUBERNETES_PORT_53_UDP': 'udp://xx.yy.xx.yy:53', 'LANG': 'en_US.UTF-8', 'EG_SHARED_NAMESPACE': 'True', 'HOSTNAME': 'enterprise-gateway-64f9dc585d-c7xkm', 'EG_ENABLE_TUNNELING': 'False', 'KUBERNETES_PORT_53_UDP_PORT': '53', 'KG_PORT_RETRIES': '0', 'NB_UID': '1000', 'EG_LOG_LEVEL': 'DEBUG', 'KUBERNETES_PORT_53_TCP': 'tcp://xx.yy.xx.yy:53', 'KUBERNETES_PORT_53_TCP_PORT': '53', 'JAVA_HOME': '/usr/lib/jvm/java-8-openjdk-amd64', 'CONDA_DIR': '/opt/conda', 'ENTERPRISE_GATEWAY_PORT_8888_TCP_PORT': '8888', 'CONDA_VERSION': '4.7.10', 'SPARK_VER': '2.4.1', 'KUBERNETES_SERVICE_PORT_DNS': '53', 'KUBERNETES_PORT_53_TCP_ADDR': 'xx.yy.xx.yy', 'KUBERNETES_PORT_443_TCP_PROTO': 'tcp', 'KUBERNETES_PORT_443_TCP_ADDR': 'xx.yy.xx.yy', 'EG_CULL_IDLE_TIMEOUT': '36000', 'ENTERPRISE_GATEWAY_SERVICE_HOST': 'xx.yy.xx.yy', 'KUBERNETES_PORT': 'tcp://xx.yy.xx.yy:443', 'KUBERNETES_PORT_53_UDP_ADDR': 'xx.yy.xx.yy', 'PWD': '/usr/local/bin', 'HOME': '/home/jovyan', 'KUBERNETES_SERVICE_PORT_DNS_TCP': '53', 'KERNEL_UID': '1000350000', 'ENTERPRISE_GATEWAY_PORT_8888_TCP_PROTO': 'tcp', 'EG_MIRROR_WORKING_DIRS': 'True', 'KUBERNETES_PORT_53_UDP_PROTO': 'udp', 'KUBERNETES_SERVICE_PORT_HTTPS': '443', 'DEBIAN_FRONTEND': 'noninteractive', 'KUBERNETES_PORT_443_TCP_PORT': '443', 'EG_KERNEL_LAUNCH_TIMEOUT': '60', 'EG_SSH_PORT': '2122', 'ENTERPRISE_GATEWAY_SERVICE_PORT_HTTP': '8888', 'EG_CULL_INTERVAL': '60', 'SPARK_HOME': '/opt/spark', 'NB_USER': 'jovyan', 'EG_KERNEL_WHITELIST': "['r_kubernetes','python_kubernetes','python_tf_kubernetes','python_tf_gpu_kubernetes','scala_kubernetes','spark_r_kubernetes','spark_python_kubernetes','spark_scala_kubernetes']", 'KUBERNETES_PORT_443_TCP': 'tcp://xx.yy.xx.yy:443', 'EG_CULL_CONNECTED': 'False', 'EG_PORT_RETRIES': '0', 'KERNEL_GID': '1000350000', 'KG_PORT': '8888', 'ENTERPRISE_GATEWAY_SERVICE_PORT': '8888', 'SHELL': '/bin/bash', 'ENTERPRISE_GATEWAY_PORT': 'tcp://xx.yy.xx.yy:8888', 'ENTERPRISE_GATEWAY_PORT_8888_TCP': 'tcp://xx.yy.xx.yy:8888', 'EG_PORT': '8888', 'ENTERPRISE_GATEWAY_PORT_8888_TCP_ADDR': 'xx.yy.xx.yy', 'SHLVL': '0', 'LANGUAGE': 'en_US.UTF-8', 'EG_KERNEL_CLUSTER_ROLE': 'kernel-controller', 'KUBERNETES_SERVICE_PORT': '443', 'EG_NAMESPACE': 'enterprise-gateway', 'NB_GID': '100', 'KG_IP': '0.0.0.0', 'PATH': '/opt/conda/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin', 'EG_IP': '0.0.0.0', 'KUBERNETES_SERVICE_HOST': 'xx.yy.xx.yy', 'MINICONDA_VERSION': '4.6.14', 'KUBERNETES_PORT_53_TCP_PROTO': 'tcp', 'KERNEL_USERNAME': 'jovyan', 'KERNEL_LAUNCH_TIMEOUT': '40', 'KERNEL_WORKING_DIR': '/home/jovyan/work', 'SPARK_OPTS': '--master k8s://https://${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT} --deploy-mode cluster --name ${KERNEL_USERNAME}-${KERNEL_ID} --conf spark.kubernetes.namespace=${KERNEL_NAMESPACE} --conf spark.kubernetes.driver.label.app=enterprise-gateway --conf spark.kubernetes.driver.label.kernel_id=${KERNEL_ID} --conf spark.kubernetes.driver.label.component=kernel --conf spark.kubernetes.executor.label.app=enterprise-gateway --conf spark.kubernetes.executor.label.kernel_id=${KERNEL_ID} --conf spark.kubernetes.executor.label.component=kernel --conf spark.kubernetes.driver.container.image=${KERNEL_IMAGE} --conf spark.kubernetes.executor.container.image=${KERNEL_EXECUTOR_IMAGE} --conf spark.kubernetes.authenticate.driver.serviceAccountName=${KERNEL_SERVICE_ACCOUNT_NAME} --conf spark.kubernetes.submission.waitAppCompletion=false --conf spark.kubernetes.pyspark.pythonVersion=3 ${KERNEL_EXTRA_SPARK_OPTS}', 'LAUNCH_OPTS': '', 'KERNEL_GATEWAY': '1', 'KERNEL_POD_NAME': 'jovyan-956248df-391b-4bdd-89a6-ead1b0732661', 'KERNEL_SERVICE_ACCOUNT_NAME': 'default', 'KERNEL_NAMESPACE': 'enterprise-gateway', 'KERNEL_IMAGE': 'docker-registry.default.svc:5000/enterprise-gateway/kernel-spark-py:dev', 'KERNEL_EXECUTOR_IMAGE': 'docker-registry.default.svc:5000/enterprise-gateway/kernel-spark-py:dev', 'EG_MIN_PORT_RANGE_SIZE': '1000', 'EG_MAX_PORT_RANGE_RETRIES': '5', 'KERNEL_ID': '956248df-391b-4bdd-89a6-ead1b0732661', 'KERNEL_LANGUAGE': 'python', 'EG_IMPERSONATION_ENABLED': 'False'} [I 2019-08-26 18:12:41.864 EnterpriseGatewayApp] KubernetesProcessProxy: kernel launched. Kernel image: docker-registry.default.svc:5000/enterprise-gateway/kernel-spark-py:dev, KernelID: 956248df-391b-4bdd-89a6-ead1b0732661, cmd: '['/usr/local/share/jupyter/kernels/spark_python_kubernetes/bin/run.sh', '--RemoteProcessProxy.kernel-id', '956248df-391b-4bdd-89a6-ead1b0732661', '--RemoteProcessProxy.response-address', 'xx.yy.xx.yy:port', '--RemoteProcessProxy.spark-context-initialization-mode', 'lazy']'

Starting IPython kernel for Spark in Kubernetes mode on behalf of user jovyan

Environment

kevin-bates commented 5 years ago

Thanks for bringing this to our attention. Unfortunately, I'm unable to reproduce this issue right now but have a couple questions/suggestions to try...

  1. Is this a "vanilla" kubernetes environment or something more like Open Shift or AWS, etc?
  2. Is there a reason you want to use the shared namespace (so that all kernel pods reside in the enterprise gateway namespace)? The reason I ask is because I'm seeing some spark-related issues when sharing the namespace - although these are relative to accessing the context information - past the point where any kind of .ivy2 directory would be accessed.
    Can you try setting EG_SHARED_NAMESPACE=False (the default)? More as a data point, but also possible solution since we don't recommend sharing the EG namespace.
  3. You might try following the advice in this SO post and add something like --conf spark.jars.ivy=/tmp to your SPARK_OPTS of your kernel.json file.
  4. Have you altered the kernel-spark-py image for your usage? If so, can you try with an elyra/kernel-spark-py image from dockerhub?

I've attached to the kernel pod image and there is no ".ivy*" directory anywhere in the container. So, for whatever reason, my environment is not encountering this issue.

kevin-bates commented 4 years ago

Closing due to lack of response - please reopen with additional information if necessary.

lresende commented 1 year ago

So, an easy way to reproduce this is to set the user in the docker image as 185 (anonymous)... Then, my understanding is that there were two issues:

My thoughts on fixing this were to:

What really worked as to add the following config on the spark_opts

 --conf spark.driver.extraJavaOptions="-Divy.cache.dir=/tmp -Divy.home=/tmp" 

I will be working on a patch to configure that in the sample kernelspecs

kevin-bates commented 1 year ago

Thanks @lresende!

set the user in the docker image as 185 (anonymous)

How exactly is this done? Is this something in the launch script or the image itself? I will need to reproduce the issue.

Does this apply to all spark-based kernels regardless of platform (k8s, Hadoop/YARN, ssh)?

kevin-bates commented 1 year ago

Thanks @lresende. I can reproduce this using the following Dockerfile:

FROM elyra/kernel-spark-py:3.1.0
USER 185

However, I'm not sure if the HOME is relative to the ?/.ivy2/local reference or if that's WORKDIR since that will be the current working directory. In the Spark images, we have HOME=/home/jovyan but WORKDIR=$SPARK_HOME/work-dir where SPARK_HOME is /opt/spark.

If I try to adjust those permissions in either area, I still get the issue, so perhaps updating spark opts is the way to go.

Thanks for looking into this!