Closed jameskyle closed 10 years ago
I don't see anything listening on localhost:9000, but didn't see any services kicked off by the bootstrap.sh that would start one.
Hi,
Would you be so kind as to send me the output of docker run -i -t -h sandbox sequenceiq/spark /etc/bootstrap.sh -bash
, please?
It shall be something like:
~ $ docker run -i -t -h sandbox sequenceiq/spark /etc/bootstrap.sh -bash
/
Starting sshd: [ OK ]
Starting namenodes on [localhost]
localhost: starting namenode, logging to /usr/local/hadoop/logs/hadoop-root-namenode-sandbox.out
localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-sandbox.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-root-secondarynamenode-sandbox.out
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn--resourcemanager-sandbox.out
localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-sandbox.out
bash-4.1#
The output shall display that Starting namenodes on [localhost]
.
Please also send me the output of jps
command and netstat -na | grep LISTEN
command to see which service has not been started on your docker container.
bash-4.1# jps
115 NameNode
541 ResourceManager
380 SecondaryNameNode
633 NodeManager
1376 Jps
226 DataNode
bash-4.1# netstat -na | grep LISTEN
tcp 0 0 0.0.0.0:50020 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:9000 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:50090 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:50070 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:50010 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:50075 0.0.0.0:* LISTEN
tcp 0 0 :::8031 :::* LISTEN
tcp 0 0 :::8032 :::* LISTEN
tcp 0 0 :::8033 :::* LISTEN
tcp 0 0 :::8040 :::* LISTEN
tcp 0 0 :::8042 :::* LISTEN
tcp 0 0 :::22 :::* LISTEN
tcp 0 0 :::8088 :::* LISTEN
tcp 0 0 :::13562 :::* LISTEN
tcp 0 0 :::56157 :::* LISTEN
tcp 0 0 :::8030 :::* LISTEN
If something is different then we are on a good track to figure out what went wrong. If the output of commands looks like above, then please try again to execute the stock example (from inside the docker container and not from your host machine):
bash-4.1# ./bin/spark-shell --master yarn-client --driver-memory 1g --executor-memory 1g --executor-cores 1
Thanks,
Attila
Looks like named nodes manager isn't starting.
$ docker run -i -t -h sandbox sequenceiq/spark /etc/bootstrap.sh -bash
/
Starting sshd: [ OK ]
Starting namenodes on [localhost]
Starting secondary namenodes [0.0.0.0]
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn--resourcemanager-sandbox.out
bash-4.1#
jps seems to confirm that.
bash-4.1# jps
648 Jps
367 ResourceManager
Netstat
bash-4.1# netstat -na | grep LISTEN
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN
tcp 0 0 :::22 :::* LISTEN
tcp 0 0 :::8088 :::* LISTEN
tcp 0 0 :::8030 :::* LISTEN
tcp 0 0 :::8031 :::* LISTEN
tcp 0 0 :::8032 :::* LISTEN
tcp 0 0 :::8033 :::* LISTEN
Hi,
We have updated the base image for spark, could you try it again after executing docker pull sequenceiq/spark
, please?
If it still does not work then please check that docker run -i -t tianon/centos:6.5 /bin/bash -c "useradd testuser; su testuser"
is executed without error, just to be sure that you are not using one of the broken kernels: https://github.com/dotcloud/docker/issues/7123
Attila
I do get an error on that centos image:
$ docker run --rm -i -t tianon/centos:6.5 /bin/bash -c "useradd testuser; su testuser"
Unable to find image 'tianon/centos:6.5' locally
Pulling repository tianon/centos
89b52f216c6c: Download complete
su: incorrect password
My kernel:
$ uname -r
3.15.5-200.fc20.x86_64
This seems to be a few patch versions ahead of the bug report, but still the same problematic minor version.
closed as duplicate of https://github.com/dotcloud/docker/issues/7123
Just to truly close this out, I confirmed it's a kernel issue by running the same box on a cluster with an older 3.10 kernel.
Thanks James for letting us know and wrapping it up.
Hi James,
We have created a workaround for the kernel issue and applied on our docker-spark image as well. Could you pull the latest image with docker pull sequenceiq/docker-spark
and try it, please? It shall work with 3.15.5-200.fc20.x86_64 kernel as well. The description of the fix is available here: https://github.com/sequenceiq/docker-pam
Thanks, Attila
I am getting this issue with sequenceiq/spark:1.3.0
My kernel version is Linux al-XPS-13-mint 3.19.3-031903-generic
- can anyone confirm whether this is a regression, or whether there is a new issue with a similar profile?
Running the suggested example: ./bin/spark-shell --master yarn-client --driver-memory 1g --executor-memory 1g --executor-cores 1
Throws the following java exception:
Running the subsequent command produces the following: