Running example throws java exception.

jameskyle commented 10 years ago

Running the suggested example: ./bin/spark-shell --master yarn-client --driver-memory 1g --executor-memory 1g --executor-cores 1

Throws the following java exception:

java.net.ConnectException: Call From sandbox/172.16.0.6 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

Running the subsequent command produces the following:

scala> sc.parallelize(1 to 1000).count()
<console>:11: error: not found: value sc
               sc.parallelize(1 to 1000).count()
               ^

jameskyle commented 10 years ago

I don't see anything listening on localhost:9000, but didn't see any services kicked off by the bootstrap.sh that would start one.

akanto commented 10 years ago

Hi,

Would you be so kind as to send me the output of docker run -i -t -h sandbox sequenceiq/spark /etc/bootstrap.sh -bash, please?

It shall be something like:

~ $ docker run -i -t -h sandbox sequenceiq/spark /etc/bootstrap.sh -bash
/
Starting sshd:                                             [  OK  ]
Starting namenodes on [localhost]
localhost: starting namenode, logging to /usr/local/hadoop/logs/hadoop-root-namenode-sandbox.out
localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-sandbox.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-root-secondarynamenode-sandbox.out
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn--resourcemanager-sandbox.out
localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-sandbox.out
bash-4.1#

The output shall display that Starting namenodes on [localhost].

Please also send me the output of jps command and netstat -na | grep LISTEN command to see which service has not been started on your docker container.

bash-4.1# jps
115 NameNode
541 ResourceManager
380 SecondaryNameNode
633 NodeManager
1376 Jps
226 DataNode

bash-4.1# netstat -na | grep LISTEN
tcp        0      0 0.0.0.0:50020               0.0.0.0:*                   LISTEN
tcp        0      0 127.0.0.1:9000              0.0.0.0:*                   LISTEN
tcp        0      0 0.0.0.0:50090               0.0.0.0:*                   LISTEN
tcp        0      0 0.0.0.0:50070               0.0.0.0:*                   LISTEN
tcp        0      0 0.0.0.0:22                  0.0.0.0:*                   LISTEN
tcp        0      0 0.0.0.0:50010               0.0.0.0:*                   LISTEN
tcp        0      0 0.0.0.0:50075               0.0.0.0:*                   LISTEN
tcp        0      0 :::8031                     :::*                        LISTEN
tcp        0      0 :::8032                     :::*                        LISTEN
tcp        0      0 :::8033                     :::*                        LISTEN
tcp        0      0 :::8040                     :::*                        LISTEN
tcp        0      0 :::8042                     :::*                        LISTEN
tcp        0      0 :::22                       :::*                        LISTEN
tcp        0      0 :::8088                     :::*                        LISTEN
tcp        0      0 :::13562                    :::*                        LISTEN
tcp        0      0 :::56157                    :::*                        LISTEN
tcp        0      0 :::8030                     :::*                        LISTEN

If something is different then we are on a good track to figure out what went wrong. If the output of commands looks like above, then please try again to execute the stock example (from inside the docker container and not from your host machine):

bash-4.1# ./bin/spark-shell --master yarn-client --driver-memory 1g --executor-memory 1g --executor-cores 1

Thanks,

Attila

jameskyle commented 10 years ago

Looks like named nodes manager isn't starting.

    $ docker run -i -t -h sandbox sequenceiq/spark /etc/bootstrap.sh -bash
/
Starting sshd:                                             [  OK  ]
Starting namenodes on [localhost]
Starting secondary namenodes [0.0.0.0]
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn--resourcemanager-sandbox.out
bash-4.1#

jps seems to confirm that.

bash-4.1#  jps
648 Jps
367 ResourceManager

Netstat

bash-4.1# netstat -na | grep LISTEN
tcp        0      0 0.0.0.0:22                  0.0.0.0:*                   LISTEN
tcp        0      0 :::22                       :::*                        LISTEN
tcp        0      0 :::8088                     :::*                        LISTEN
tcp        0      0 :::8030                     :::*                        LISTEN
tcp        0      0 :::8031                     :::*                        LISTEN
tcp        0      0 :::8032                     :::*                        LISTEN
tcp        0      0 :::8033                     :::*                        LISTEN

akanto commented 10 years ago

Hi,

We have updated the base image for spark, could you try it again after executing docker pull sequenceiq/spark, please?

If it still does not work then please check that docker run -i -t tianon/centos:6.5 /bin/bash -c "useradd testuser; su testuser" is executed without error, just to be sure that you are not using one of the broken kernels: https://github.com/dotcloud/docker/issues/7123

Attila

jameskyle commented 10 years ago

I do get an error on that centos image:

$ docker run --rm -i -t tianon/centos:6.5 /bin/bash -c "useradd testuser; su testuser"
Unable to find image 'tianon/centos:6.5' locally
Pulling repository tianon/centos
89b52f216c6c: Download complete
su: incorrect password

My kernel:

$ uname -r
3.15.5-200.fc20.x86_64

This seems to be a few patch versions ahead of the bug report, but still the same problematic minor version.

akanto commented 10 years ago

closed as duplicate of https://github.com/dotcloud/docker/issues/7123

jameskyle commented 10 years ago

Just to truly close this out, I confirmed it's a kernel issue by running the same box on a cluster with an older 3.10 kernel.

matyix commented 10 years ago

Thanks James for letting us know and wrapping it up.

akanto commented 10 years ago

Hi James,

We have created a workaround for the kernel issue and applied on our docker-spark image as well. Could you pull the latest image with docker pull sequenceiq/docker-spark and try it, please? It shall work with 3.15.5-200.fc20.x86_64 kernel as well. The description of the fix is available here: https://github.com/sequenceiq/docker-pam

Thanks, Attila

alistair-broomhead commented 9 years ago

I am getting this issue with sequenceiq/spark:1.3.0 My kernel version is Linux al-XPS-13-mint 3.19.3-031903-generic - can anyone confirm whether this is a regression, or whether there is a new issue with a similar profile?

sequenceiq / docker-spark

Running example throws java exception. #1