mesosphere-backup / hdfs-deprecated

[DEPRECATED] This project is deprecated. It will be archived on December 1, 2017.
Apache License 2.0
147 stars 52 forks source link

Do I need to install hdfs to run my Spark jobs #250

Open yogeshnath opened 8 years ago

yogeshnath commented 8 years ago

I installed Mesosphere in HA mode and installed Spark using dcos. Run my job but got the following error. Do I need to install hdfs as well?

"Exception in thread "main" java.lang.IllegalArgumentException: java.net.UnknownHostException: namenode1.hdfs.mesos"

Mesos-DNS would not resolve this. I could resolve master.mesos, slave.mesos.


spark.driver.extraJavaOptions=-Dspark.mesos.executor.docker.image=mesosphere/spark:1.6.0 spark.driver.memory=1024M spark.executor.extraJavaOptions=-Dspark.mesos.executor.docker.image=mesosphere/spark:1.6.0 spark.jars=file:/mnt/mesos/sandbox/tmo.d3.automation-6.jar spark.logConf=true spark.master=mesos://zk://master.mesos:2181/mesos spark.mesos.executor.docker.image=mesosphere/spark:1.6.0 spark.submit.deployMode=client 16/02/05 23:40:13 INFO SecurityManager: Changing view acls to: root 16/02/05 23:40:13 INFO SecurityManager: Changing modify acls to: root 16/02/05 23:40:13 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 16/02/05 23:40:13 INFO Utils: Successfully started service 'sparkDriver' on port 39550. 16/02/05 23:40:13 INFO Slf4jLogger: Slf4jLogger started 16/02/05 23:40:13 INFO Remoting: Starting remoting 16/02/05 23:40:14 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@10.0.2.1:44087] 16/02/05 23:40:14 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 44087. 16/02/05 23:40:14 INFO SparkEnv: Registering MapOutputTracker 16/02/05 23:40:14 INFO SparkEnv: Registering BlockManagerMaster 16/02/05 23:40:14 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-4ca3392b-4069-4f9a-872b-eeb1b115dd9a 16/02/05 23:40:14 INFO MemoryStore: MemoryStore started with capacity 511.1 MB 16/02/05 23:40:14 INFO SparkEnv: Registering OutputCommitCoordinator 16/02/05 23:40:14 INFO Utils: Successfully started service 'SparkUI' on port 4040. 16/02/05 23:40:14 INFO SparkUI: Started SparkUI at http://10.0.2.1:4040 16/02/05 23:40:14 INFO HttpFileServer: HTTP File server directory is /tmp/spark-a244d6fc-7f86-4218-9d76-06217ef29e97/httpd-cd2b0d83-fa6f-4680-9160-1c318bb2f68a 16/02/05 23:40:14 INFO HttpServer: Starting HTTP Server 16/02/05 23:40:14 INFO Utils: Successfully started service 'HTTP file server' on port 39342. 16/02/05 23:40:14 INFO SparkContext: Added JAR file:/mnt/mesos/sandbox/tmo.d3.automation-6.jar at http://10.0.2.1:39342/jars/tmo.d3.automation-6.jar with timestamp 1454715614551 2016-02-05 23:40:14,719:6(0x7f5693bbd700):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5 2016-02-05 23:40:14,719:6(0x7f5693bbd700):ZOO_INFO@log_env@716: Client environment:host.name=ip-10-0-2-1.us-west-1.compute.internal 2016-02-05 23:40:14,719:6(0x7f5693bbd700):ZOO_INFO@log_env@723: Client environment:os.name=Linux 2016-02-05 23:40:14,719:6(0x7f5693bbd700):ZOO_INFO@log_env@724: Client environment:os.arch=4.2.2-coreos-r1 2016-02-05 23:40:14,719:6(0x7f5693bbd700):ZOO_INFO@log_env@725: Client environment:os.version=#2 SMP Tue Dec 1 01:59:59 UTC 2015 2016-02-05 23:40:14,720:6(0x7f5693bbd700):ZOO_INFO@log_env@733: Client environment:user.name=(null) 2016-02-05 23:40:14,720:6(0x7f5693bbd700):ZOO_INFO@log_env@741: Client environment:user.home=/root 2016-02-05 23:40:14,720:6(0x7f5693bbd700):ZOO_INFO@log_env@753: Client environment:user.dir=/opt/spark/dist 2016-02-05 23:40:14,720:6(0x7f5693bbd700):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=master.mesos:2181 sessionTimeout=10000 watcher=0x7f569b4ad600 sessionId=0 sessionPasswd= context=0x7f56c80012c0 flags=0 I0205 23:40:14.720655 95 sched.cpp:164] Version: 0.25.0 2016-02-05 23:40:14,728:6(0x7f56922b9700):ZOO_INFO@check_events@1703: initiated connection to server [10.0.4.122:2181] 2016-02-05 23:40:14,734:6(0x7f56922b9700):ZOO_INFO@check_events@1750: session establishment complete on server [10.0.4.122:2181], sessionId=0x152b2fcc05e0007, negotiated timeout=10000 I0205 23:40:14.735280 89 group.cpp:331] Group process (group(1)@10.0.2.1:43793) connected to ZooKeeper I0205 23:40:14.735371 89 group.cpp:805] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0) I0205 23:40:14.735466 89 group.cpp:403] Trying to create path '/mesos' in ZooKeeper I0205 23:40:14.744324 89 detector.cpp:156] Detected a new leader: (id='1') I0205 23:40:14.744611 89 group.cpp:674] Trying to get '/mesos/json.info_0000000001' in ZooKeeper I0205 23:40:14.748836 89 detector.cpp:481] A new leading master (UPID=master@10.0.4.123:5050) is detected I0205 23:40:14.749011 89 sched.cpp:262] New master detected at master@10.0.4.123:5050 I0205 23:40:14.749320 89 sched.cpp:272] No credentials provided. Attempting to register without authentication I0205 23:40:14.756449 89 sched.cpp:641] Framework registered with 20385f5d-b460-4335-913f-aa02816b7963-0003 16/02/05 23:40:14 INFO CoarseMesosSchedulerBackend: Registered as framework ID 20385f5d-b460-4335-913f-aa02816b7963-0003 16/02/05 23:40:14 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 44576. 16/02/05 23:40:14 INFO NettyBlockTransferService: Server created on 44576 16/02/05 23:40:14 INFO BlockManagerMaster: Trying to register BlockManager 16/02/05 23:40:14 INFO BlockManagerMasterEndpoint: Registering block manager 10.0.2.1:44576 with 511.1 MB RAM, BlockManagerId(driver, 10.0.2.1, 44576) 16/02/05 23:40:14 INFO BlockManagerMaster: Registered BlockManager 16/02/05 23:40:15 INFO CoarseMesosSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 16/02/05 23:40:15 INFO CoarseMesosSchedulerBackend: Mesos task 1 is now TASK_RUNNING 16/02/05 23:40:15 INFO CoarseMesosSchedulerBackend: Mesos task 3 is now TASK_RUNNING 16/02/05 23:40:15 INFO CoarseMesosSchedulerBackend: Mesos task 0 is now TASK_RUNNING 16/02/05 23:40:15 INFO CoarseMesosSchedulerBackend: Mesos task 4 is now TASK_RUNNING 16/02/05 23:40:15 INFO CoarseMesosSchedulerBackend: Mesos task 2 is now TASK_RUNNING 16/02/05 23:40:17 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 117.2 KB, free 117.2 KB) 16/02/05 23:40:17 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 12.6 KB, free 129.8 KB) 16/02/05 23:40:17 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.0.2.1:44576 (size: 12.6 KB, free: 511.1 MB) 16/02/05 23:40:17 INFO SparkContext: Created broadcast 0 from textFile at FenixRowCountstest.scala:51 16/02/05 23:40:17 WARN DFSUtil: Namenode for hdfs remains unresolved for ID nn1. Check your hdfs-site.xml file to ensure namenodes are configured properly. 16/02/05 23:40:17 WARN DFSUtil: Namenode for hdfs remains unresolved for ID nn2. Check your hdfs-site.xml file to ensure namenodes are configured properly. 16/02/05 23:40:18 INFO SparkContext: Starting job: foreach at FenixRowCountstest.scala:126 16/02/05 23:40:18 INFO DAGScheduler: Got job 0 (foreach at FenixRowCountstest.scala:126) with 2 output partitions 16/02/05 23:40:18 INFO DAGScheduler: Final stage: ResultStage 0 (foreach at FenixRowCountstest.scala:126) 16/02/05 23:40:18 INFO DAGScheduler: Parents of final stage: List() 16/02/05 23:40:18 INFO DAGScheduler: Missing parents: List() 16/02/05 23:40:18 INFO CoarseMesosSchedulerBackend: Registered executor NettyRpcEndpointRef(null) (ip-10-0-2-79.us-west-1.compute.internal:39924) with ID 20385f5d-b460-4335-913f-aa02816b7963-S1 16/02/05 23:40:18 INFO CoarseMesosSchedulerBackend: Registered executor NettyRpcEndpointRef(null) (ip-10-0-3-40.us-west-1.compute.internal:60136) with ID 20385f5d-b460-4335-913f-aa02816b7963-S3 16/02/05 23:40:18 INFO DAGScheduler: Submitting ResultStage 0 (ParallelCollectionRDD[2] at parallelize at FenixRowCountstest.scala:125), which has no missing parents 16/02/05 23:40:18 INFO BlockManagerMasterEndpoint: Registering block manager ip-10-0-2-79.us-west-1.compute.internal:44431 with 511.1 MB RAM, BlockManagerId(20385f5d-b460-4335-913f-aa02816b7963-S1, ip-10-0-2-79.us-west-1.compute.internal, 44431) 16/02/05 23:40:18 INFO CoarseMesosSchedulerBackend: Registered executor NettyRpcEndpointRef(null) (ip-10-0-2-163.us-west-1.compute.internal:51562) with ID 20385f5d-b460-4335-913f-aa02816b7963-S5 16/02/05 23:40:18 INFO BlockManagerMasterEndpoint: Registering block manager ip-10-0-3-40.us-west-1.compute.internal:45844 with 511.1 MB RAM, BlockManagerId(20385f5d-b460-4335-913f-aa02816b7963-S3, ip-10-0-3-40.us-west-1.compute.internal, 45844) 16/02/05 23:40:18 INFO CoarseMesosSchedulerBackend: Registered executor NettyRpcEndpointRef(null) (ip-10-0-0-164.us-west-1.compute.internal:55604) with ID 20385f5d-b460-4335-913f-aa02816b7963-S4 16/02/05 23:40:18 INFO BlockManagerMasterEndpoint: Registering block manager ip-10-0-2-163.us-west-1.compute.internal:45068 with 511.1 MB RAM, BlockManagerId(20385f5d-b460-4335-913f-aa02816b7963-S5, ip-10-0-2-163.us-west-1.compute.internal, 45068) 16/02/05 23:40:18 INFO BlockManagerMasterEndpoint: Registering block manager ip-10-0-0-164.us-west-1.compute.internal:42653 with 511.1 MB RAM, BlockManagerId(20385f5d-b460-4335-913f-aa02816b7963-S4, ip-10-0-0-164.us-west-1.compute.internal, 42653) 16/02/05 23:40:18 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 1160.0 B, free 130.9 KB) 16/02/05 23:40:18 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 824.0 B, free 131.7 KB) 16/02/05 23:40:18 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 10.0.2.1:44576 (size: 824.0 B, free: 511.1 MB) 16/02/05 23:40:18 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006 16/02/05 23:40:18 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (ParallelCollectionRDD[2] at parallelize at FenixRowCountstest.scala:125) 16/02/05 23:40:18 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 16/02/05 23:40:18 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, ip-10-0-0-164.us-west-1.compute.internal, partition 0,PROCESS_LOCAL, 2283 bytes) 16/02/05 23:40:18 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, ip-10-0-2-79.us-west-1.compute.internal, partition 1,PROCESS_LOCAL, 2389 bytes) 16/02/05 23:40:18 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on ip-10-0-0-164.us-west-1.compute.internal:42653 (size: 824.0 B, free: 511.1 MB) 16/02/05 23:40:18 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on ip-10-0-2-79.us-west-1.compute.internal:44431 (size: 824.0 B, free: 511.1 MB) 16/02/05 23:40:19 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 826 ms on ip-10-0-0-164.us-west-1.compute.internal (1/2) 16/02/05 23:40:19 INFO DAGScheduler: ResultStage 0 (foreach at FenixRowCountstest.scala:126) finished in 0.843 s 16/02/05 23:40:19 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 765 ms on ip-10-0-2-79.us-west-1.compute.internal (2/2) 16/02/05 23:40:19 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 16/02/05 23:40:19 INFO DAGScheduler: Job 0 finished: foreach at FenixRowCountstest.scala:126, took 1.166327 s 16/02/05 23:40:19 WARN DFSUtil: Namenode for hdfs remains unresolved for ID nn1. Check your hdfs-site.xml file to ensure namenodes are configured properly. 16/02/05 23:40:19 WARN DFSUtil: Namenode for hdfs remains unresolved for ID nn2. Check your hdfs-site.xml file to ensure namenodes are configured properly. Exception in thread "main" java.lang.IllegalArgumentException: java.net.UnknownHostException: namenode1.hdfs.mesos at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377) at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:240) at org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider.getProxy(ConfiguredFailoverProxyProvider.java:124) at org.apache.hadoop.io.retry.RetryInvocationHandler.(RetryInvocationHandler.java:74) at org.apache.hadoop.io.retry.RetryInvocationHandler.(RetryInvocationHandler.java:65) at org.apache.hadoop.io.retry.RetryProxy.create(RetryProxy.java:58) at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:152) at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:579) at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:524) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:146) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2397) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167) at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:653) at org.apache.hadoop.mapred.FileOutputFormat.setOutputPath(FileOutputFormat.java:146) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply$mcV$sp(PairRDDFunctions.scala:1058) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:1026) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:1026) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:1026) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply$mcV$sp(PairRDDFunctions.scala:952) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply(PairRDDFunctions.scala:952) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply(PairRDDFunctions.scala:952) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:951) at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply$mcV$sp(RDD.scala:1443) at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1422) at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1422) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1422) at FenixRowCountstest$.writeLogger(FenixRowCountstest.scala:127) at FenixRowCountstest$.main(FenixRowCountstest.scala:72) at FenixRowCountstest.main(FenixRowCountstest.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.net.UnknownHostException: namenode1.hdfs.mesos ... 51 more 16/02/05 23:40:19 INFO SparkContext: Invoking stop() from shutdown hook 16/02/05 23:40:19 INFO SparkUI: Stopped Spark web UI at http://10.0.2.1:4040 16/02/05 23:40:19 INFO CoarseMesosSchedulerBackend: Shutting down all executors 16/02/05 23:40:19 INFO CoarseMesosSchedulerBackend: Asking each executor to shut down I0205 23:40:19.559413 108 sched.cpp:1771] Asked to stop the driver I0205 23:40:19.559767 93 sched.cpp:1040] Stopping framework '20385f5d-b460-4335-913f-aa02816b7963-0003' 16/02/05 23:40:19 INFO CoarseMesosSchedulerBackend: driver.run() returned with code DRIVER_STOPPED 16/02/05 23:40:19 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 16/02/05 23:40:19 INFO MemoryStore: MemoryStore cleared 16/02/05 23:40:19 INFO BlockManager: BlockManager stopped 16/02/05 23:40:19 INFO BlockManagerMaster: BlockManagerMaster stopped 16/02/05 23:40:19 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 16/02/05 23:40:19 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon. 16/02/05 23:40:19 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports. 16/02/05 23:40:19 INFO SparkContext: Successfully stopped SparkContext 16/02/05 23:40:19 INFO ShutdownHookManager: Shutdown hook called 16/02/05 23:40:19 INFO ShutdownHookManager: Deleting directory /tmp/spark-a244d6fc-7f86-4218-9d76-06217ef29e97/httpd-cd2b0d83-fa6f-4680-9160-1c318bb2f68a 16/02/05 23:40:19 INFO ShutdownHookManager: Deleting directory /tmp/spark-a244d6fc-7f86-4218-9d76-06217ef29e97

radek1st commented 8 years ago

I have a similar issue on a fresh install of single master DCOS and Spark:

... 16/05/16 17:52:37 WARN DFSUtil: Namenode for hdfs remains unresolved for ID nn1. Check your hdfs-site.xml file to ensure namenodes are configured properly. 16/05/16 17:52:37 WARN DFSUtil: Namenode for hdfs remains unresolved for ID nn2. Check your hdfs-site.xml file to ensure namenodes are configured properly. Exception in thread "main" java.lang.IllegalArgumentException: java.net.UnknownHostException: namenode1.hdfs.mesos at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377) at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:240) at org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider.getProxy(ConfiguredFailoverProxyProvider.java:124) at org.apache.hadoop.io.retry.RetryInvocationHandler.(RetryInvocationHandler.java:74) at org.apache.hadoop.io.retry.RetryInvocationHandler.(RetryInvocationHandler.java:65) at org.apache.hadoop.io.retry.RetryProxy.create(RetryProxy.java:58) ...

jstokes commented 8 years ago

Just wanted to add - I get something similar submitting tasks through either Zeppelin or the dcos spark run ... cli

java.lang.IllegalArgumentException: java.net.UnknownHostException: hdfs
    at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
    at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:240)
    at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:144)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:579)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:524)
    at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:146)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2397)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167)
    at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:653)
    at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:427)
    at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:400)
    at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$33.apply(SparkContext.scala:1015)
    at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$33.apply(SparkContext.scala:1015)
    at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
    at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
    at scala.Option.map(Option.scala:145)
    at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176)
    at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:212)
    at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:208)
    at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
    at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.UnknownHostException: hdfs
    ... 44 more

When running through Zeppelin, this will cause the first task submitted to fail, but subsequent tasks complete successfully (although the error is still logged). Replicated on a clean cluster built from AWS Cloud Formation templates.

thbeh commented 8 years ago

@jstokes did you manage to solve your issue with dcos spark run cli?

ktaube commented 8 years ago

I managed to solve this by adding spark.mesos.uri property to --submit-args with links to my own hdfs-site.xml and core-site.xml.

At least mesosphere/spark:1.0.1-1.6.1-2 docker image contains spark-env.sh that copies those config files from MESOS_SANDBOX to to HADOOP_CONF_DIR

./conf/spark-env.sh:11:[ -f "${MESOS_SANDBOX}/hdfs-site.xml" ] && cp "${MESOS_SANDBOX}/hdfs-site.xml" "${HADOOP_CONF_DIR}"
./conf/spark-env.sh:12:[ -f "${MESOS_SANDBOX}/core-site.xml" ] && cp "${MESOS_SANDBOX}/core-site.xml" "${HADOOP_CONF_DIR}"
jstokes commented 8 years ago

@thbeh No - I was away from DCOS for a little while. I stood up a new cluster but was immediately hit with the same exceptions in https://github.com/mesosphere/universe/issues/613.

Is there a workaround for this?