Closed dm-tran closed 7 years ago
I can launch and use Spark 1.6.3 clusters just fine with Flintrock. Are you having issues? What issues are you having, exactly?
Also, reading through the discussion on https://github.com/apache/spark/pull/13543, it looks like Flintrock is doing the correct thing by using SPARK_MASTER_HOST
, which was supported and used before Spark 2.0. That PR just clarified that _HOST
is preferred and _IP
is deprecated.
Thanks for your answer @nchammas.
The issue I have with Spark 1.6.3 is that slaves are not recognized nor displayed in Spark UI:
./flintrock --config spark16.yaml launch cluster-test
Requesting 2 spot instances at a max price of $0.1...
0 of 2 instances granted. Waiting...
All 2 instances granted.
[172.4.95.149] SSH online.
[172.4.81.85] SSH online.
[172.4.95.149] Configuring ephemeral storage...
[172.4.81.85] Configuring ephemeral storage...
[172.4.95.149] Installing Java 1.8...
[172.4.81.85] Installing Java 1.8...
[172.4.95.149] Installing HDFS...
[172.4.81.85] Installing HDFS...
[172.4.95.149] Installing Spark...
[172.4.81.85] Installing Spark...
[172.4.81.85] Configuring HDFS master...
[172.4.81.85] Configuring Spark master...
HDFS online.
Spark Health Report:
* Master: ALIVE
* Workers: 0
* Cores: 0
* Memory: 0.0 GB
launch finished in 0:03:21.
It turns out that this is due to a configuration problem of our VPC. By default, machines in our VPC can only communicate with each other using IPs.
In start-slaves.sh in Spark 1.6, the default value of SPARK_MASTER_IP is hostname
:
if [ "$SPARK_MASTER_IP" = "" ]; then
SPARK_MASTER_IP="`hostname`"
fi
That's why slaves were not recognized.
Sorry for the disturbance, I am closing the issue.
No worries. Glad you figured things out.
FWIW SPARK_MASTER_HOST
can also be set to an IP address, according to https://github.com/apache/spark/pull/13543.
This PR makes the following changes:
SPARK_MASTER_IP
, as it is needed for Spark 1.6.x.I tested this PR using command
flintrock launch
with Spark 1.6.3.Scripts "sbin/start-master.sh" and "sbin/start-slaves.sh" in 1.6.x use
SPARK_MASTER_IP
. This has been changed by the following PR in Spark 2.0.0 : https://github.com/apache/spark/pull/13543/filesThe associated JIRA is https://issues.apache.org/jira/browse/SPARK-15806