Closed xiandong79 closed 6 years ago
@ncherel
Spark version 2.1.1 Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_151)
(I can also lanuch a Spark version 2.2.0)
local
# For Scala and Java, use run-example:
./bin/run-example SparkPi
./bin/spark-submit examples/src/main/python/pi.py
3. it does not work with `standalone`
./bin/spark-submit --master spark://35.162.130.151:7077 examples/src/main/python/pi.py 100
17/11/28 15:45:36 INFO SecurityManager: Changing view acls to: ec2-user
17/11/28 15:45:36 INFO SecurityManager: Changing modify acls to: ec2-user
17/11/28 15:45:36 INFO SecurityManager: Changing view acls groups to:
17/11/28 15:45:36 INFO SecurityManager: Changing modify acls groups to:
17/11/28 15:45:36 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ec2-user); groups with view permissions: Set(); users with modify permissions: Set(ec2-user); groups with modify permissions: Set()
17/11/28 15:45:36 INFO Utils: Successfully started service 'sparkDriver' on port 32937.
17/11/28 15:45:36 INFO SparkEnv: Registering MapOutputTracker
17/11/28 15:45:36 INFO SparkEnv: Registering BlockManagerMaster
17/11/28 15:45:36 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
17/11/28 15:45:36 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
17/11/28 15:45:36 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-b05bab88-e040-4614-9665-bbeeea5f5c94
17/11/28 15:45:36 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
17/11/28 15:45:37 INFO SparkEnv: Registering OutputCommitCoordinator
17/11/28 15:45:37 INFO Utils: Successfully started service 'SparkUI' on port 4040.
17/11/28 15:45:37 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://172.31.6.28:4040
17/11/28 15:45:37 INFO SparkContext: Added file file:/home/ec2-user/spark/examples/src/main/python/pi.py at spark://172.31.6.28:32937/files/pi.py with timestamp 1511883937418
17/11/28 15:45:37 INFO Utils: Copying /home/ec2-user/spark/examples/src/main/python/pi.py to /tmp/spark-eb3b751f-7e90-49d2-b1be-ab5fa1fd4eb1/userFiles-e6575cb1-3e15-4f5b-b3e0-3d707eea6e9a/pi.py
17/11/28 15:45:37 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://35.162.130.151:7077...
17/11/28 15:45:37 WARN StandaloneAppClient$ClientEndpoint: Failed to connect to master 35.162.130.151:7077
org.apache.spark.SparkException: Exception thrown in awaitResult
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:100)
at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:108)
at org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint$$anonfun$tryRegisterAllMasters$1$$anon$1.run(StandaloneAppClient.scala:106)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Failed to connect to /35.162.130.151:7077
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:232)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:182)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:197)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:194)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190)
... 4 more
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: /35.162.130.151:7077
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:257)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:291)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:640)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:575)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:489)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:451)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:140)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
... 1 more
17/11/28 15:45:57 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://35.162.130.151:7077...
17/11/28 15:45:57 WARN StandaloneAppClient$ClientEndpoint: Failed to connect to master 35.162.130.151:7077
org.apache.spark.SparkException: Exception thrown in awaitResult
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:100)
at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:108)
at org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint$$anonfun$tryRegisterAllMasters$1$$anon$1.run(StandaloneAppClient.scala:106)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Failed to connect to /35.162.130.151:7077
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:232)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:182)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:197)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:194)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190)
... 4 more
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: /35.162.130.151:7077
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:257)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:291)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:640)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:575)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:489)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:451)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:140)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
... 1 more
17/11/28 15:46:17 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://35.162.130.151:7077...
17/11/28 15:46:17 WARN StandaloneAppClient$ClientEndpoint: Failed to connect to master 35.162.130.151:7077
org.apache.spark.SparkException: Exception thrown in awaitResult
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:100)
at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:108)
at org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint$$anonfun$tryRegisterAllMasters$1$$anon$1.run(StandaloneAppClient.scala:106)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Failed to connect to /35.162.130.151:7077
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:232)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:182)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:197)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:194)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190)
... 4 more
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: /35.162.130.151:7077
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:257)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:291)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:640)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:575)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:489)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:451)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:140)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
... 1 more
17/11/28 15:46:37 ERROR StandaloneSchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.
17/11/28 15:46:37 WARN StandaloneSchedulerBackend: Application ID is not initialized yet.
17/11/28 15:46:37 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 35355.
17/11/28 15:46:37 INFO NettyBlockTransferService: Server created on 172.31.6.28:35355
17/11/28 15:46:37 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
17/11/28 15:46:37 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 172.31.6.28, 35355, None)
17/11/28 15:46:37 INFO BlockManagerMasterEndpoint: Registering block manager 172.31.6.28:35355 with 366.3 MB RAM, BlockManagerId(driver, 172.31.6.28, 35355, None)
17/11/28 15:46:37 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 172.31.6.28, 35355, None)
17/11/28 15:46:37 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 172.31.6.28, 35355, None)
17/11/28 15:46:37 INFO SparkUI: Stopped Spark web UI at http://172.31.6.28:4040
17/11/28 15:46:37 INFO StandaloneSchedulerBackend: Shutting down all executors
17/11/28 15:46:37 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asking each executor to shut down
17/11/28 15:46:37 WARN StandaloneAppClient$ClientEndpoint: Drop UnregisterApplication(null) because has not yet connected to master
17/11/28 15:46:37 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/11/28 15:46:37 INFO MemoryStore: MemoryStore cleared
17/11/28 15:46:37 INFO BlockManager: BlockManager stopped
17/11/28 15:46:37 INFO BlockManagerMaster: BlockManagerMaster stopped
17/11/28 15:46:37 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/11/28 15:46:37 INFO SparkContext: Successfully stopped SparkContext
17/11/28 15:46:37 ERROR SparkContext: Error initializing SparkContext.
java.lang.IllegalArgumentException: requirement failed: Can only call getServletHandlers on a running MetricsSystem
at scala.Predef$.require(Predef.scala:224)
at org.apache.spark.metrics.MetricsSystem.getServletHandlers(MetricsSystem.scala:91)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:524)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:236)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)
17/11/28 15:46:37 INFO SparkContext: SparkContext already stopped.
Traceback (most recent call last):
File "/home/ec2-user/spark/examples/src/main/python/pi.py", line 32, in <module>
.appName("PythonPi")\
File "/home/ec2-user/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 169, in getOrCreate
File "/home/ec2-user/spark/python/lib/pyspark.zip/pyspark/context.py", line 310, in getOrCreate
File "/home/ec2-user/spark/python/lib/pyspark.zip/pyspark/context.py", line 118, in __init__
File "/home/ec2-user/spark/python/lib/pyspark.zip/pyspark/context.py", line 182, in _do_init
File "/home/ec2-user/spark/python/lib/pyspark.zip/pyspark/context.py", line 249, in _initialize_context
File "/home/ec2-user/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1401, in __call__
File "/home/ec2-user/spark/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.IllegalArgumentException: requirement failed: Can only call getServletHandlers on a running MetricsSystem
at scala.Predef$.require(Predef.scala:224)
at org.apache.spark.metrics.MetricsSystem.getServletHandlers(MetricsSystem.scala:91)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:524)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:236)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)
What does flintrock describe <cluster_name>
show as the master address? If you use that instead of the IP address, does the Pi example work?
flintrock --config us-east-m4-4.yaml describe us-east-m4-4
us-east-m4-4:
state: running
node-count: 5
master: ec2-34-228-165-101.compute-1.amazonaws.com
slaves:
- ec2-52-204-167-201.compute-1.amazonaws.com
- ec2-34-228-79-253.compute-1.amazonaws.com
- ec2-34-207-145-126.compute-1.amazonaws.com
- ec2-54-87-135-196.compute-1.amazonaws.com
when submitting jobs ./bin/spark-submit --master spark://34.228.165.101:7077 examples/src/main/python/pi.py 100
17/11/29 06:11:00 INFO Utils: Successfully started service 'SparkUI' on port 4040.
17/11/29 06:11:00 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://172.31.27.84:4040
17/11/29 06:11:01 INFO SparkContext: Added file file:/home/ec2-user/spark/examples/src/main/python/pi.py at spark://172.31.27.84:42255/files/pi.py with timestamp 1511935861098
17/11/29 06:11:01 INFO Utils: Copying /home/ec2-user/spark/examples/src/main/python/pi.py to /tmp/spark-62afdf21-6993-4235-94e4-36930ed2938b/userFiles-1c2a6fb3-db51-47e0-8671-f9573c6a516d/pi.py
17/11/29 06:11:01 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://34.228.165.101:7077...
17/11/29 06:11:21 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://34.228.165.101:7077...
17/11/29 06:11:41 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://34.228.165.101:7077...
I found pravite IP and Public IP? there maybe something wrong??
17/11/29 06:12:01 ERROR StandaloneSchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.
17/11/29 06:12:01 WARN StandaloneSchedulerBackend: Application ID is not initialized yet.
./bin/spark-submit --master spark://ec2-34-228-165-101.compute-1.amazonaws.com:7077 examples/src/main/python/pi.py 100
17/11/29 06:16:00 WARN StandaloneAppClient$ClientEndpoint: Failed to connect to master ec2-34-228-165-101.compute-1.amazonaws.com:7077
Are you running spark-submit
from the Flintrock master? If not, can you try there?
definitely!! I use SSH copied from the Amazon console, and SSH into the master.
Then, there is a spark
folder, I spark-submit
from the Flintrock master
I found pravite IP and Public IP? there maybe something wrong??
Hmm, that could be a sign something is wrong. Does your VPC have an internet gateway attached?
VPC ID
vpc-4652db3e
igw-029d0a7b | attached | vpc-4652db3e
I also download flintrock.zip
and launch a new cluster. And run the following, still ERROR!
[ec2-user@ip-172-31-18-114 spark]$ ./bin/spark-submit --master spark://54.146.168.248:7077 examples/src/main/python/pi.py 10
flintrock --config us-east-m4-4.yaml describe new-us-east
is
new-us-east:
state: running
node-count: 5
master: ec2-54-146-168-248.compute-1.amazonaws.com
slaves:
- ec2-54-164-183-18.compute-1.amazonaws.com
- ec2-34-228-161-228.compute-1.amazonaws.com
- ec2-34-236-154-38.compute-1.amazonaws.com
- ec2-52-72-179-230.compute-1.amazonaws.com
I just launched a test cluster and ran the Pi example as follows:
./spark/bin/spark-submit ./spark/examples/src/main/python/pi.py
./spark/bin/spark-submit --master spark://ec2-54-152-31-224.compute-1.amazonaws.com:7077 ./spark/examples/src/main/python/pi.py
Both invocations worked fine for me and returned "Pi is roughly...".
By the way, you shouldn't need to specify --master
because the master is already specified in conf/spark-env.sh
, but it should work either way. So I'm baffled as to why you are seeing issues.
spark-submit
without --master
?--master
, can you confirm that you're specifying the exact same address as what's in conf/spark-env.sh
?flintrock launch
statement you're using? Are there any custom scripts you're running that might affect networking?worked fine for me and returned "Pi is roughly...".
# For Scala and Java, use run-example:
[ec2-user@ip-172-31-18-114 spark]$ ./bin/run-example SparkPi
# For Python examples, use spark-submit directly:
[ec2-user@ip-172-31-18-114 spark]$ ./bin/spark-submit examples/src/main/python/pi.py
But above command
only use private IP
17/11/30 04:28:41 INFO TransportClientFactory: Successfully created connection to /172.31.18.114:44711 after 33 ms (0 ms spent in bootstraps)
The problem happens when specify --master
, I want to run jobs in standalone
with 4 slaves mode.
$ flintrock --config us-east-m4-4.yaml start us-east-m4-4
the us-east-m4-4.yaml
is:
provider: ec2
services:
spark:
version: 2.1.1
launch:
num-slaves: 4
providers:
ec2:
key-name: Virginia-us-east-1
identity-file: /Users/dong/Virginia-us-east-1.pem
instance-type: m4.large
region: us-east-1
ami: ami-a4c7edb2
user: ec2-user
I do not have conf/spark-env.sh
file, but only template flie
[ec2-user@ip-172-31-18-114 conf]$ ls
docker.properties.template log4j.properties.template slaves.template spark-env.sh.template
fairscheduler.xml.template metrics.properties.template spark-defaults.conf.template
Hmm, if the spark/conf
directory on your Flintrock master doesn't have spark-env.sh
or slaves
, then something is going wrong during cluster launch.
flintrock launch ...
?Yes !!! I noticed that not find flintrock-manifest.json
warn in the terminal.
But the cluster is running, I regarded it is fine.
pip3 install flintrock
2. down the zip folder$Flintrock ./flintrock --config us-east-m4-4.yaml launch us-east-m4-4
Launching 5 instances...
[54.175.255.28] SSH online.
[54.175.255.28] Configuring ephemeral storage...
[54.175.255.28] Installing Java 1.8...
[54.173.186.134] SSH online.
[174.129.117.235] SSH online.
[54.198.144.167] SSH online.
[54.173.186.134] Configuring ephemeral storage...
[174.129.117.235] Configuring ephemeral storage...
[54.173.186.134] Installing Java 1.8...
[54.198.144.167] Configuring ephemeral storage...
[174.129.117.235] Installing Java 1.8...
[52.54.108.59] SSH online.
[54.198.144.167] Installing Java 1.8...
[52.54.108.59] Configuring ephemeral storage...
[52.54.108.59] Installing Java 1.8...
[54.198.144.167] Installing Spark...
[54.175.255.28] Installing Spark...
[174.129.117.235] Installing Spark...
[52.54.108.59] Installing Spark...
[54.173.186.134] Installing Spark...
Do you want to terminate the 5 instances created by this operation? [Y/n]: n
Failed to execute script standalone
Traceback (most recent call last):
File "standalone.py", line 11, in <module>
File "flintrock/flintrock.py", line 1132, in main
File "click/core.py", line 722, in __call__
File "click/core.py", line 697, in main
File "click/core.py", line 1066, in invoke
File "click/core.py", line 895, in invoke
File "click/core.py", line 535, in invoke
File "click/decorators.py", line 17, in new_func
File "flintrock/flintrock.py", line 403, in launch
File "flintrock/ec2.py", line 53, in wrapper
File "flintrock/ec2.py", line 954, in launch
File "flintrock/core.py", line 618, in provision_cluster
File "flintrock/core.py", line 492, in run_against_hosts
File "concurrent/futures/_base.py", line 405, in result
File "concurrent/futures/_base.py", line 357, in __get_result
File "concurrent/futures/thread.py", line 55, in run
File "flintrock/core.py", line 678, in provision_node
File "flintrock/services.py", line 359, in configure
File "flintrock/core.py", line 448, in generate_template_mapping
AttributeError: 'NoneType' object has no attribute 'split'
Do you want to terminate the 5 instances created by this operation? [Y/n]: n
Should I choose "yes" ??
HH, I choose y
it also same AttributeError: 'NoneType' object has no attribute 'split'
$Flintrock ./flintrock --config us-east-m4-4.yaml start test
Cluster is in state 'shutting-down'. Cannot execute start.
Failed to execute script standalone
Maybe the env
of my Mac is broken.
I have uninstalled and install flintrock again.
flintrock --config us-east-m4-4.yaml launch test-us-east
Do you want to terminate the 5 instances created by this operation? [Y/n]: y
Terminating instances...
Traceback (most recent call last):
File "/usr/local/bin/flintrock", line 11, in <module>
sys.exit(main())
File "/usr/local/lib/python3.6/site-packages/flintrock/flintrock.py", line 1132, in main
cli(obj={})
File "/usr/local/lib/python3.6/site-packages/click/core.py", line 722, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.6/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.6/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/click/decorators.py", line 17, in new_func
return f(get_current_context(), *args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/flintrock/flintrock.py", line 403, in launch
tags=ec2_tags)
File "/usr/local/lib/python3.6/site-packages/flintrock/ec2.py", line 53, in wrapper
res = func(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/flintrock/ec2.py", line 954, in launch
identity_file=identity_file)
File "/usr/local/lib/python3.6/site-packages/flintrock/core.py", line 618, in provision_cluster
run_against_hosts(partial_func=partial_func, hosts=hosts)
File "/usr/local/lib/python3.6/site-packages/flintrock/core.py", line 492, in run_against_hosts
future.result()
File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/concurrent/futures/_base.py", line 432, in result
return self.__get_result()
File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/concurrent/futures/thread.py", line 56, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.6/site-packages/flintrock/core.py", line 678, in provision_node
cluster=cluster)
File "/usr/local/lib/python3.6/site-packages/flintrock/services.py", line 359, in configure
spark_version=self.version or self.git_commit,
File "/usr/local/lib/python3.6/site-packages/flintrock/core.py", line 448, in generate_template_mapping
'hadoop_short_version': '.'.join(hadoop_version.split('.')[:2]),
AttributeError: 'NoneType' object has no attribute 'split'
This need attention.
Then launch another one, still
AttributeError: 'NoneType' object has no attribute 'split'
.
Go into the master,
[ec2-user@ip-172-31-22-104 conf]$ ls
docker.properties.template log4j.properties.template slaves.template spark-env.sh.template
fairscheduler.xml.template metrics.properties.template spark-defaults.conf.template
LOL! ENOUGH hah
OK, that explains a lot. :) The launch has errors, so of course the resulting cluster doesn’t work as expected.
It looks like you don’t have a Hadoop version specified. Specify one please. You can take a look at the config template in this repo for suggestions.
For example:
services:
spark:
version: 2.2.0
hdfs:
version: 2.7.4
Flintrock provides a bunch of default values when you first call flintrock configure
, but I guess you created your own config from scratch.
I will investigate why Flintrock isn’t providing a clean error message when the Hadoop version is not specified, because that should be happening.
To be clear, if the launch has errors then do not try to use the cluster anyway. A failed launch means the cluster is likely in a broken state. The launch errors need to be debugged first and a new cluster launched before trying to do anything with the cluster.
I’m honestly surprised you didn’t mention the launch errors from the start. It would have saved us a lot of back and forth debugging the issue here.
I copied exactly from the Sample config.yaml
in your README.md
Sample config.yaml
also does not specify any Hadoop version or information.
You're right. I believe the README config example used to work fine, but #196 probably broke this. My apologies. I'll fix this. (The config template does specify the HDFS version, though.)
In any case, are you able to launch a cluster without errors now?
Yes !! I have collected the spark job traces
I need!
Thanks!!
I try to run several spark-bench benchmarks on EC2 launched by flintrock.
In the
standalone
mode of spark, I have to configure the IP address of EC2 machine as the "master" of spark.The console and .conf is below: