jupyter/all-spark-notebooks on existing Spark/YARN cluster

GeoffDuniam commented 7 years ago

Hi, and thanks for all your work on the Docker images for Jupyterhub.

We have Jupyterhub installed on a gateway node of our Spark/Hadoop cluster (Cloudera) and we'd like to utilise your all-spark-notebooks container - but we're running YARN, not MESOS. Is it possible to configure the container to work with YARN or are we going to be restricted to running on MESOS?

Thanks for your time

Cheers

Geoff

britishbadger commented 7 years ago

You can extend the provided image like so :

FROM jupyter/all-spark-notebook

# Set env vars for pydoop
ENV HADOOP_HOME /usr/local/hadoop-2.7.3
ENV JAVA_HOME /usr/lib/jvm/java-8-openjdk-amd64
ENV HADOOP_CONF_HOME /usr/local/hadoop-2.7.3/etc/hadoop
ENV HADOOP_CONF_DIR  /usr/local/hadoop-2.7.3/etc/hadoop

USER root
# Add proper open-jdk-8 not just the jre, needed for pydoop
RUN echo 'deb http://cdn-fastly.deb.debian.org/debian jessie-backports main' > /etc/apt/sources.list.d/jessie-backports.list && \
    apt-get -y update && \
    apt-get install --no-install-recommends -t jessie-backports -y openjdk-8-jdk && \
    rm /etc/apt/sources.list.d/jessie-backports.list && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/ && \
# Add hadoop binaries
    wget http://mirrors.ukfast.co.uk/sites/ftp.apache.org/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz && \
    tar -xvf hadoop-2.7.3.tar.gz -C /usr/local && \
    chown -R $NB_USER:users /usr/local/hadoop-2.7.3 && \
    rm -f hadoop-2.7.3.tar.gz && \
# Install os dependencies required for pydoop, pyhive
    apt-get update && \
    apt-get install --no-install-recommends -y build-essential python-dev libsasl2-dev && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/* && \
# Remove the example hadoop configs and replace
# with those for our cluster.
# Alternatively this could be mounted as a volume
    rm -f /usr/local/hadoop-2.7.3/etc/hadoop/*

# Download this from ambari / cloudera manager and copy here
COPY example-hadoop-conf/ /usr/local/hadoop-2.7.3/etc/hadoop/

# Spark-Submit doesn't work unless I set the following
RUN echo "spark.driver.extraJavaOptions -Dhdp.version=2.5.3.0-37" >> /usr/local/spark/conf/spark-defaults.conf  && \
    echo "spark.yarn.am.extraJavaOptions -Dhdp.version=2.5.3.0-37" >> /usr/local/spark/conf/spark-defaults.conf && \
    echo "spark.master=yarn" >>  /usr/local/spark/conf/spark-defaults.conf && \
    echo "spark.hadoop.yarn.timeline-service.enabled=false" >> /usr/local/spark/conf/spark-defaults.conf && \
    chown -R $NB_USER:users /usr/local/spark/conf/spark-defaults.conf && \
    # Create an alternative HADOOP_CONF_HOME so we can mount as a volume and repoint
    # using ENV var if needed
    mkdir -p /etc/hadoop/conf/ && \
    chown $NB_USER:users /etc/hadoop/conf/

USER $NB_USER

# Install useful jupyter extensions and python libraries like :
# - Dashboards
# - PyDoop
# - PyHive
RUN pip install jupyter_dashboards faker && \
    jupyter dashboards quick-setup --sys-prefix && \
    pip2 install pyhive pydoop thrift sasl thrift_sasl faker

USER root
# Ensure we overwrite the kernel config so that toree connects to cluster
RUN jupyter toree install --sys-prefix --spark_opts="--master yarn --deploy-mode client --driver-memory 512m  --executor-memory 512m  --executor-cores 1 --driver-java-options -Dhdp.version=2.5.3.0-37 --conf spark.hadoop.yarn.timeline-service.enabled=false"
USER $NB_USER

parente commented 7 years ago

Thanks @britishbadger. I've added your recipe to the https://github.com/jupyter/docker-stacks/wiki/Docker-recipes page and attributed it to you.

GeoffDuniam commented 7 years ago

Hi all,

Sorry about the delay getting back - firstly, thanks to @britishbadger for the mods to the image - we have it working fine running the docker image. As per the all-spark-notebook page (https://github.com/jupyter/docker-stacks/tree/master/all-spark-notebook) we need to run the container with the --net=host --pid=host -e TINI_SUBREAPER options set. Running the container

docker run --rm -it --net=host -p 8888:8888 jupyterhub/all-sparkyarn-notebook

works fine, we can connect python to the YARN server and create the spark and hive contexts no problem. pyspark is also connecting to the cluster as well.

Running this image through jupyterhub, quick question - how do we pass the --net=host --pid=host -e TINI_SUBREAPER options to dockerspawner? I've attached the jupyterhub_confiy.py file we're using.

Thanks for any insights

Cheers

Geoff

jupyterhub_config.txt

jdavidheiser commented 7 years ago

@GeoffDuniam To enable host-mode networking, you'll need to do something like this in your jupyterhub_config:

c.DockerSpawner.extra_host_config = {"network_mode": "host"}
c.DockerSpawner.use_internal_ip = True
c.DockerSpawner.network_name = "host"

In addition, you will need to create a custom spawner, which inherits from DockerSpawner, but does port assignment. Otherwise, with host-mode networking you will try to share ports and only a single Docker container will be able to connect to the hub.

See my comments in #187 for more

qwemicheal commented 6 years ago

I build the image provided by @britishbadger and encounter the following error:

Executing the command: jupyter notebook Traceback (most recent call last): File "/opt/conda/lib/python3.6/site-packages/traitlets/traitlets.py", line 528, in get value = obj._trait_values[self.name] KeyError: 'runtime_dir' ..... PermissionError: [Errno 13] Permission denied: '/home/jovyan/.local/share'

Somehow the dir causing the error show below property

drwx--S--- 3 root users 4096 May 29 13:09 .local

I have to add two more lines of chown & chmod at the end of the original Dockerfile after the run toree command to be able to run the image.

sj123050037 commented 6 years ago

I tried to use the above Dockerfile to run Jupyter against a cloudera hadoop cluster (Yarn client mode). I am able to start the Jupyter notebook in the docker container and it is listening to port 8888. However, when I create a new notebook by selecting Apache-toree kernel and try running scala code using sc - it fails to connect to the cluster. The error I see is -

Waiting for a Spark session to start... Name: org.apache.spark.SparkException Message: Yarn application has already ended! It might have been killed or unable to launch application master. StackTrace: at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:89) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:63) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164) at org.apache.spark.SparkContext.(SparkContext.scala:500) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2486) at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:930) at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:921) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921) at org.apache.toree.kernel.api.Kernel$$anonfun$1.apply(Kernel.scala:428) at org.apache.toree.kernel.api.Kernel$$anonfun$1.apply(Kernel.scala:428) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

Following is the command I use to run the docker container :- docker run -p 8888:8888 -dt <image_name>

I tried to execute following scala code :- sc.parallelize( 1 to 100)

Cluster Configuration : Its a three node Cloudera Hadoop cluster. Docker container is running on another node outside the cluster. Cluster nodes are reachable from the container node.

Docker version 18.03.1-ce, build 9ee9f40

Any help will be highly appreciated. Thanks.

enricorotundo commented 6 years ago

Running the Dockerfile above gives me this error.

W: GPG error: http://cdn-fastly.deb.debian.org/debian jessie-backports InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 8B48AD6246925553 NO_PUBKEY 7638D0442B90D010
E: The repository 'http://cdn-fastly.deb.debian.org/debian jessie-backports InRelease' is not signed.
The command '/bin/sh -c echo 'deb http://cdn-fastly.deb.debian.org/debian jessie-backports main' > /etc/apt/sources.list.d/jessie-backports.list &&     apt-get -y update &&     apt-get install --no-install-recommends -t jessie-backports -y openjdk-8-jdk &&     rm /etc/apt/sources.list.d/jessie-backports.list &&     apt-get clean &&     rm -rf /var/lib/apt/lists/ &&     wget http://mirrors.ukfast.co.uk/sites/ftp.apache.org/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz &&     tar -xvf hadoop-2.7.3.tar.gz -C /usr/local &&     chown -R $NB_USER:users /usr/local/hadoop-2.7.3 &&     rm -f hadoop-2.7.3.tar.gz &&     apt-get update &&     apt-get install --no-install-recommends -y build-essential python-dev libsasl2-dev &&     apt-get clean &&     rm -rf /var/lib/apt/lists/* &&     rm -f /usr/local/hadoop-2.7.3/etc/hadoop/*' returned a non-zero code: 100

Which I fixed by adding what follows right below row nr. 9 (i.e. USER root):

# Fix GPG key issue
RUN apt-get update
RUN apt-get install -y gnupg
RUN gpg --keyserver pgp.mit.edu --recv-keys \
        7638D0442B90D010 8B48AD6246925553
RUN gpg --armor --export 7638D0442B90D010 | apt-key add -
RUN gpg --armor --export 8B48AD6246925553 | apt-key add -

However, I'm now having a second issue about unmet deps with openjdk-8-jdk:

[...]
Step 12/19 : RUN echo 'deb http://cdn-fastly.deb.debian.org/debian jessie-backports main' > /etc/apt/sources.list.d/jessie-backports.list &&     apt-get -y update &&     apt-get install --no-install-recommends -t jessie-backports -y openjdk-8-jdk &&     rm /etc/apt/sources.list.d/jessie-backports.list &&     apt-get clean &&     rm -rf /var/lib/apt/lists/ &&     wget http://mirrors.ukfast.co.uk/sites/ftp.apache.org/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz &&     tar -xvf hadoop-2.7.3.tar.gz -C /usr/local &&     chown -R $NB_USER:users /usr/local/hadoop-2.7.3 &&     rm -f hadoop-2.7.3.tar.gz &&     apt-get update &&     apt-get install --no-install-recommends -y build-essential python-dev libsasl2-dev &&     apt-get clean &&     rm -rf /var/lib/apt/lists/* &&     rm -f /usr/local/hadoop-2.7.3/etc/hadoop/*
 ---> Running in c097ebca4d43
Get:1 http://cdn-fastly.deb.debian.org/debian jessie-backports InRelease [166 kB]
Hit:2 http://archive.ubuntu.com/ubuntu bionic InRelease
Hit:3 http://security.ubuntu.com/ubuntu bionic-security InRelease
Hit:4 http://archive.ubuntu.com/ubuntu bionic-updates InRelease
Hit:5 http://archive.ubuntu.com/ubuntu bionic-backports InRelease
Hit:6 http://repos.mesosphere.com/ubuntu xenial InRelease
Get:7 http://cdn-fastly.deb.debian.org/debian jessie-backports/main amd64 Packages [1,172 kB]
Fetched 1,338 kB in 1s (2,671 kB/s)
Reading package lists...
Reading package lists...
Building dependency tree...
Reading state information...
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 openjdk-8-jdk : Depends: openjdk-8-jre (= 8u171-b11-1~bpo8+1) but it is not going to be installed
                 Depends: openjdk-8-jdk-headless (= 8u171-b11-1~bpo8+1) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.
The command '/bin/sh -c echo 'deb http://cdn-fastly.deb.debian.org/debian jessie-backports main' > /etc/apt/sources.list.d/jessie-backports.list &&     apt-get -y update &&     apt-get install --no-install-recommends -t jessie-backports -y openjdk-8-jdk &&     rm /etc/apt/sources.list.d/jessie-backports.list &&     apt-get clean &&     rm -rf /var/lib/apt/lists/ &&     wget http://mirrors.ukfast.co.uk/sites/ftp.apache.org/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz &&     tar -xvf hadoop-2.7.3.tar.gz -C /usr/local &&     chown -R $NB_USER:users /usr/local/hadoop-2.7.3 &&     rm -f hadoop-2.7.3.tar.gz &&     apt-get update &&     apt-get install --no-install-recommends -y build-essential python-dev libsasl2-dev &&     apt-get clean &&     rm -rf /var/lib/apt/lists/* &&     rm -f /usr/local/hadoop-2.7.3/etc/hadoop/*' returned a non-zero code: 100

UPDATE: Managed to run the following Dockerfile but couldn't connect to my yarn cluster.

FROM jupyter/all-spark-notebook:ef9ef707038d

# Set env vars for pydoop
ENV HADOOP_HOME /usr/local/hadoop-2.7.3
ENV JAVA_HOME /usr/lib/jvm/java-8-openjdk-amd64
ENV HADOOP_CONF_HOME /usr/local/hadoop-2.7.3/etc/hadoop
ENV HADOOP_CONF_DIR  /usr/local/hadoop-2.7.3/etc/hadoop

USER root
# Add proper open-jdk-8 not just the jre, needed for pydoop
RUN echo 'deb http://cdn-fastly.deb.debian.org/debian jessie-backports main' > /etc/apt/sources.list.d/jessie-backports.list && \
    apt-get -y update && \
    apt-get install --no-install-recommends -t jessie-backports -y openjdk-8-jdk && \
    rm /etc/apt/sources.list.d/jessie-backports.list && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/ && \
# Add hadoop binaries
    wget https://archive.apache.org/dist/hadoop/core/hadoop-2.7.3/hadoop-2.7.3.tar.gz && \
    tar -xvf hadoop-2.7.3.tar.gz -C /usr/local && \
    chown -R $NB_USER:users /usr/local/hadoop-2.7.3 && \
    rm -f hadoop-2.7.3.tar.gz && \
# Install os dependencies required for pydoop, pyhive
    apt-get update && \
    apt-get install --no-install-recommends -y build-essential python-dev libsasl2-dev && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/* && \
# Remove the example hadoop configs and replace
# with those for our cluster.
# Alternatively this could be mounted as a volume
    rm -f /usr/local/hadoop-2.7.3/etc/hadoop/*

#NOTE RUN mkdir example-hadoop-conf
# Download this from ambari / cloudera manager and copy here
COPY example-hadoop-conf/ /usr/local/hadoop-2.7.3/etc/hadoop/

# Spark-Submit doesn't work unless I set the following
RUN echo "spark.driver.extraJavaOptions -Dhdp.version=2.5.3.0-37" >> /usr/local/spark/conf/spark-defaults.conf  && \
    echo "spark.yarn.am.extraJavaOptions -Dhdp.version=2.5.3.0-37" >> /usr/local/spark/conf/spark-defaults.conf && \
    echo "spark.master=yarn" >>  /usr/local/spark/conf/spark-defaults.conf && \
    echo "spark.hadoop.yarn.timeline-service.enabled=false" >> /usr/local/spark/conf/spark-defaults.conf && \
    chown -R $NB_USER:users /usr/local/spark/conf/spark-defaults.conf && \
    # Create an alternative HADOOP_CONF_HOME so we can mount as a volume and repoint
    # using ENV var if needed
    mkdir -p /etc/hadoop/conf/ && \
    chown $NB_USER:users /etc/hadoop/conf/

USER $NB_USER

RUN pip install --upgrade pip
RUN pip2 install --upgrade pip

# Install useful jupyter extensions and python libraries like :
# - Dashboards
# - PyDoop
# - PyHive
RUN pip install jupyter_dashboards faker
RUN jupyter dashboards quick-setup --sys-prefix
RUN pip2 install pyhive
#RUN pip2 install pydoop   
RUN pip2 install thrift
RUN pip2 install sasl
RUN pip2 install thrift_sasl
RUN pip2 install faker

USER root
# Ensure we overwrite the kernel config so that toree connects to cluster
RUN jupyter toree install --sys-prefix --spark_opts="--master yarn --deploy-mode client --driver-memory 512m  --executor-memory 512m  --executor-cores 1 --driver-java-options -Dhdp.version=2.5.3.0-37 --conf spark.hadoop.yarn.timeline-service.enabled=false"
USER $NB_USER

SamambaMan commented 6 years ago

Having the same error:

Name: org.apache.spark.SparkException
Message: Yarn application has already ended! It might have been killed or unable to launch application master.
StackTrace:   at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:89)
  at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:63)
  at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164)
  at org.apache.spark.SparkContext.<init>(SparkContext.scala:500)
  at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2493)
  at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:933)
  at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:924)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:924)
  at org.apache.toree.kernel.api.Kernel$$anonfun$1.apply(Kernel.scala:428)
  at org.apache.toree.kernel.api.Kernel$$anonfun$1.apply(Kernel.scala:428)
  at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
  at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
  at scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121)
  at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
  at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
  at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

With the following Dockerfile:

FROM jupyter/all-spark-notebook

# Set env vars for pydoop
ENV HADOOP_HOME /usr/local/hadoop-2.6.0
ENV JAVA_HOME /usr/lib/jvm/java-8-openjdk-amd64
ENV HADOOP_CONF_HOME /usr/local/hadoop-2.6.0/etc/hadoop
ENV HADOOP_CONF_DIR  /usr/local/hadoop-2.6.0/etc/hadoop

# Create a Python 2.x environment using conda including at least the ipython kernel
# and the kernda utility. Add any additional packages you want available for use
# in a Python 2 notebook to the first line here (e.g., pandas, matplotlib, etc.)
RUN conda create --quiet --yes -p $CONDA_DIR/envs/python2 python=2.7 ipython ipykernel kernda && \
    conda clean -tipsy

USER root

# Create a global kernelspec in the image and modify it so that it properly activates
# the python2 conda environment.
RUN $CONDA_DIR/envs/python2/bin/python -m ipykernel install && \
$CONDA_DIR/envs/python2/bin/kernda -o -y /usr/local/share/jupyter/kernels/python2/kernel.json

USER $NB_USER

RUN unset http_proxy
RUN unset https_proxy
RUN unset HTTP_PROXY
RUN unset HTTPS_PROXY

RUN export http_proxy=''
RUN export https_proxy=''
RUN export HTTP_PROXY=''
RUN export HTTPS_PROXY=''

USER root
RUN apt-get -y update --allow-unauthenticated -o Acquire::AllowInsecureRepositories=true -o Acquire::AllowDowngradeToInsecureRepositories=true && \
    apt-get install --no-install-recommends -y openjdk-8-jdk && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/ && \
# Add hadoop binaries
    wget https://archive.apache.org/dist/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz && \
    tar -xvf hadoop-2.6.0.tar.gz -C /usr/local && \
    chown -R $NB_USER:users /usr/local/hadoop-2.6.0 && \
    rm -f hadoop-2.6.0.tar.gz && \
# Install os dependencies required for pydoop, pyhive
    apt-get update --allow-unauthenticated -o Acquire::AllowInsecureRepositories=true -o Acquire::AllowDowngradeToInsecureRepositories=true && \
    apt-get install --no-install-recommends -y build-essential python-dev libsasl2-dev &&\
    apt-get clean && \
    rm -rf /var/lib/apt/lists/* && \
# Remove the example hadoop configs and replace
# with those for our cluster.
# Alternatively this could be mounted as a volume
    rm -f /usr/local/hadoop-2.6.0/etc/hadoop/*

# Download this from ambari / cloudera manager and copy here
COPY CONFS/conf/* /usr/local/hadoop-2.6.0/etc/hadoop/

# Spark-Submit doesn't work unless I set the following
RUN echo "spark.driver.extraJavaOptions -Dhdp.version=2.5.3.0-37" >> /usr/local/spark/conf/spark-defaults.conf  && \
    echo "spark.yarn.am.extraJavaOptions -Dhdp.version=2.5.3.0-37" >> /usr/local/spark/conf/spark-defaults.conf && \
    echo "spark.master=yarn" >>  /usr/local/spark/conf/spark-defaults.conf && \
    echo "spark.hadoop.yarn.timeline-service.enabled=false" >> /usr/local/spark/conf/spark-defaults.conf && \
    chown -R $NB_USER:users /usr/local/spark/conf/spark-defaults.conf && \
    # Create an alternative HADOOP_CONF_HOME so we can mount as a volume and repoint
    # using ENV var if needed
    mkdir -p /etc/hadoop/conf/ && \
    chown $NB_USER:users /etc/hadoop/conf/

USER $NB_USER

# Install useful jupyter extensions and python libraries like :
# - Dashboards
# - PyDoop
# - PyHive
RUN pip install jupyter_dashboards faker && \
    jupyter dashboards quick-setup --sys-prefix 
    #pip2.7 install pyhive pydoop thrift sasl thrift_sasl faker

USER root
# Ensure we overwrite the kernel config so that toree connects to cluster
RUN jupyter toree install --sys-prefix --spark_opts="--master yarn --deploy-mode cluster --conf spark.hadoop.yarn.timeline-service.enabled=false"

RUN chown jovyan -R /home/jovyan/.local

USER $NB_USER

The yarn error log is:

Log Type: stdout

Log Upload Time: Mon Nov 05 17:54:44 -0200 2018

Log Length: 115456

Showing 4096 bytes of 115456 total. Click here for the full log.

 driver at 250808ee7dfa:46109, retrying ...
2018-11-05 17:54:41 ERROR ApplicationMaster:70 - Failed to connect to driver at 250808ee7dfa:46109, retrying ...
2018-11-05 17:54:41 ERROR ApplicationMaster:70 - Failed to connect to driver at 250808ee7dfa:46109, retrying ...
2018-11-05 17:54:41 ERROR ApplicationMaster:70 - Failed to connect to driver at 250808ee7dfa:46109, retrying ...
2018-11-05 17:54:41 ERROR ApplicationMaster:70 - Failed to connect to driver at 250808ee7dfa:46109, retrying ...
2018-11-05 17:54:41 ERROR ApplicationMaster:70 - Failed to connect to driver at 250808ee7dfa:46109, retrying ...
2018-11-05 17:54:41 ERROR ApplicationMaster:70 - Failed to connect to driver at 250808ee7dfa:46109, retrying ...
2018-11-05 17:54:41 ERROR ApplicationMaster:70 - Failed to connect to driver at 250808ee7dfa:46109, retrying ...
2018-11-05 17:54:41 ERROR ApplicationMaster:70 - Failed to connect to driver at 250808ee7dfa:46109, retrying ...
2018-11-05 17:54:42 ERROR ApplicationMaster:70 - Failed to connect to driver at 250808ee7dfa:46109, retrying ...
2018-11-05 17:54:42 ERROR ApplicationMaster:70 - Failed to connect to driver at 250808ee7dfa:46109, retrying ...
2018-11-05 17:54:42 ERROR ApplicationMaster:70 - Failed to connect to driver at 250808ee7dfa:46109, retrying ...
2018-11-05 17:54:42 ERROR ApplicationMaster:70 - Failed to connect to driver at 250808ee7dfa:46109, retrying ...
2018-11-05 17:54:42 ERROR ApplicationMaster:70 - Failed to connect to driver at 250808ee7dfa:46109, retrying ...
2018-11-05 17:54:42 ERROR ApplicationMaster:70 - Failed to connect to driver at 250808ee7dfa:46109, retrying ...
2018-11-05 17:54:42 ERROR ApplicationMaster:70 - Failed to connect to driver at 250808ee7dfa:46109, retrying ...
2018-11-05 17:54:42 ERROR ApplicationMaster:70 - Failed to connect to driver at 250808ee7dfa:46109, retrying ...
2018-11-05 17:54:42 ERROR ApplicationMaster:70 - Failed to connect to driver at 250808ee7dfa:46109, retrying ...
2018-11-05 17:54:42 ERROR ApplicationMaster:70 - Failed to connect to driver at 250808ee7dfa:46109, retrying ...
2018-11-05 17:54:43 ERROR ApplicationMaster:70 - Failed to connect to driver at 250808ee7dfa:46109, retrying ...
2018-11-05 17:54:43 ERROR ApplicationMaster:91 - Uncaught exception: 
org.apache.spark.SparkException: Failed to connect to driver!
    at org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkDriver(ApplicationMaster.scala:672)
    at org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher(ApplicationMaster.scala:532)
    at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:347)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply$mcV$sp(ApplicationMaster.scala:260)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$5.run(ApplicationMaster.scala:815)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:814)
    at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:259)
    at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:839)
    at org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:869)
    at org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
2018-11-05 17:54:43 INFO  ApplicationMaster:54 - Final app status: FAILED, exitCode: 13, (reason: Uncaught exception: org.apache.spark.SparkException: Failed to connect to driver!)
2018-11-05 17:54:43 INFO  ApplicationMaster:54 - Deleting staging directory hdfs://mpmapas-ns/user/mpmapas/.sparkStaging/application_1539869144089_2045
2018-11-05 17:54:43 INFO  ShutdownHookManager:54 - Shutdown hook called

The docker run command is:

docker run -e GRANT_SUDO=yes --user root -p 8888:8888 --add-host=### --add-host=###--add-host=### -e NB_USER=mpmapas -e NB_UID=1005 -e NB_GID=1007 mprj/jupyter-allspark start.sh jupyter notebook --NotebookApp.token=''

jakepaw commented 5 years ago

Does anyone have an approach to fix this? I am not able to connect to my existing spark. Is this an issue with ingress to my kubernetes cluster?

any help is appreciated..

jupyter / docker-stacks

jupyter/all-spark-notebooks on existing Spark/YARN cluster #369