googlegenomics / spark-examples

Apache Spark jobs such as Principal Coordinate Analysis.
Apache License 2.0
74 stars 38 forks source link

Tilde (~) does not work to refer to the home directory #19

Open pgrosu opened 10 years ago

pgrosu commented 10 years ago

Hi @elmer-garduno,

I cannot run it like this:

sbt "run --client-secrets ~/me/gg_client_secrets/client_secrets_java.json --spark-master local[4]"

and thus need to write the full path:

$ sbt "run --client-secrets /home/pgrosu/me/gg_client_secrets/client_secrets_java.json --spark-master local[4]"

Otherwise I get the following error:

$ sbt "run --client-secrets ~/me/gg_client_secrets/client_secrets_java.json --spark-master local[4]"
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256M; support was removed in 8.0
[info] Loading project definition from /home/pgrosu/me/gg_spark/spark-examples/project
[info] Set current project to googlegenomics-spark-examples (in build file:/home/pgrosu/me/gg_spark/spark-examples/)

Multiple main classes detected, select one to run:

 [1] com.google.cloud.genomics.spark.examples.SearchVariantsExampleKlotho
 [2] com.google.cloud.genomics.spark.examples.SearchVariantsExampleBRCA1
 [3] com.google.cloud.genomics.spark.examples.VariantsSource
 [4] com.google.cloud.genomics.spark.examples.VariantsPcaDriver
 [5] com.google.cloud.genomics.spark.examples.SearchReadsExample1
 [6] com.google.cloud.genomics.spark.examples.SearchReadsExample2
 [7] com.google.cloud.genomics.spark.examples.SearchReadsExample3
 [8] com.google.cloud.genomics.spark.examples.SearchReadsExample4

Enter number: 5

[info] Running com.google.cloud.genomics.spark.examples.SearchReadsExample1 --client-secrets ~/me/gg_client_secrets/client_secrets_java.json --spark-master local[4]
14/09/18 17:39:18 INFO spark.SecurityManager: Changing view acls to: pgrosu
14/09/18 17:39:18 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(pgrosu)
14/09/18 17:39:18 INFO slf4j.Slf4jLogger: Slf4jLogger started
14/09/18 17:39:18 INFO Remoting: Starting remoting
14/09/18 17:39:19 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@eofe4.cm.cluster:55192]
14/09/18 17:39:19 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@eofe4.cm.cluster:55192]
14/09/18 17:39:19 INFO spark.SparkEnv: Registering MapOutputTracker
14/09/18 17:39:19 INFO spark.SparkEnv: Registering BlockManagerMaster
14/09/18 17:39:19 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local-20140918173919-6eb9
14/09/18 17:39:19 INFO storage.MemoryStore: MemoryStore started with capacity 819.3 MB.
14/09/18 17:39:19 INFO network.ConnectionManager: Bound socket to port 44255 with id = ConnectionManagerId(eofe4.cm.cluster,44255)
14/09/18 17:39:19 INFO storage.BlockManagerMaster: Trying to register BlockManager
14/09/18 17:39:19 INFO storage.BlockManagerInfo: Registering block manager eofe4.cm.cluster:44255 with 819.3 MB RAM
14/09/18 17:39:19 INFO storage.BlockManagerMaster: Registered BlockManager
14/09/18 17:39:19 INFO spark.HttpServer: Starting HTTP Server
14/09/18 17:39:19 INFO server.Server: jetty-8.1.14.v20131031
14/09/18 17:39:19 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:42008
14/09/18 17:39:19 INFO broadcast.HttpBroadcast: Broadcast server started at http://172.16.0.247:42008
14/09/18 17:39:19 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-6348baec-029f-4db2-bfaf-b870e0b65269
14/09/18 17:39:19 INFO spark.HttpServer: Starting HTTP Server
14/09/18 17:39:19 INFO server.Server: jetty-8.1.14.v20131031
14/09/18 17:39:19 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:40599
14/09/18 17:39:20 INFO server.Server: jetty-8.1.14.v20131031
14/09/18 17:39:20 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
14/09/18 17:39:20 INFO ui.SparkUI: Started SparkUI at http://eofe4.cm.cluster:4040
14/09/18 17:39:20 ERROR executor.Executor: Exception in task ID 0
java.io.FileNotFoundException: ~/me/gg_client_secrets/client_secrets_java.json (No such file or directory)
        at java.io.FileInputStream.open(Native Method)
        at java.io.FileInputStream.<init>(FileInputStream.java:131)
        at com.google.cloud.genomics.Client$.apply(Client.scala:47)
        at com.google.cloud.genomics.spark.examples.rdd.ReadsRDD.compute(ReadsRDD.scala:82)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
        at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
        at org.apache.spark.scheduler.Task.run(Task.scala:51)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
14/09/18 17:39:20 WARN scheduler.TaskSetManager: Lost TID 0 (task 0.0:0)
14/09/18 17:39:20 WARN scheduler.TaskSetManager: Loss was due to java.io.FileNotFoundException
java.io.FileNotFoundException: ~/me/gg_client_secrets/client_secrets_java.json (No such file or directory)
        at java.io.FileInputStream.open(Native Method)
        at java.io.FileInputStream.<init>(FileInputStream.java:131)
        at com.google.cloud.genomics.Client$.apply(Client.scala:47)
        at com.google.cloud.genomics.spark.examples.rdd.ReadsRDD.compute(ReadsRDD.scala:82)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
        at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
        at org.apache.spark.scheduler.Task.run(Task.scala:51)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
14/09/18 17:39:20 ERROR scheduler.TaskSetManager: Task 0.0:0 failed 1 times; aborting job
[error] (run-main-0) org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:0 failed 1 times, most recent failure: Exception failure in TID 0 on host localhost: java.io.FileNotFoundException: ~/me/gg_client_secrets/client_secrets_java.json (No such file or directory)
[error]         java.io.FileInputStream.open(Native Method)
[error]         java.io.FileInputStream.<init>(FileInputStream.java:131)
[error]         com.google.cloud.genomics.Client$.apply(Client.scala:47)
[error]         com.google.cloud.genomics.spark.examples.rdd.ReadsRDD.compute(ReadsRDD.scala:82)
[error]         org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
[error]         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
[error]         org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
[error]         org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
[error]         org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77)
[error]         org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
[error]         org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
[error]         org.apache.spark.scheduler.Task.run(Task.scala:51)
[error]         org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
[error]         java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[error]         java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[error]         java.lang.Thread.run(Thread.java:745)
[error] Driver stacktrace:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:0 failed 1 times, most recent failure: Exception failure in TID 0 on host localhost: java.io.FileNotFoundException: ~/me/gg_client_secrets/client_secrets_java.json (No such file or directory)
        java.io.FileInputStream.open(Native Method)
        java.io.FileInputStream.<init>(FileInputStream.java:131)
        com.google.cloud.genomics.Client$.apply(Client.scala:47)
        com.google.cloud.genomics.spark.examples.rdd.ReadsRDD.compute(ReadsRDD.scala:82)
        org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
        org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
        org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77)
        org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
        org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
        org.apache.spark.scheduler.Task.run(Task.scala:51)
        org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
        java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1049)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
        at scala.Option.foreach(Option.scala:236)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
        at akka.actor.ActorCell.invoke(ActorCell.scala:456)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
        at akka.dispatch.Mailbox.run(Mailbox.scala:219)
        at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
[trace] Stack trace suppressed: run last *:run for the full output.
14/09/18 17:39:20 ERROR spark.ContextCleaner: Error in cleaning thread
java.lang.InterruptedException
        at java.lang.Object.wait(Native Method)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:142)
        at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:117)
        at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:115)
        at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:115)
        at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1160)
        at org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:114)
        at org.apache.spark.ContextCleaner$$anon$3.run(ContextCleaner.scala:65)
14/09/18 17:39:21 ERROR util.Utils: Uncaught exception in thread SparkListenerBus
java.lang.InterruptedException
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
        at java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
        at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:48)
        at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)
        at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)
        at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1160)
        at org.apache.spark.scheduler.LiveListenerBus$$anon$1.run(LiveListenerBus.scala:46)
java.lang.RuntimeException: Nonzero exit code: 1
        at scala.sys.package$.error(package.scala:27)
[trace] Stack trace suppressed: run last compile:run for the full output.
[error] (compile:run) Nonzero exit code: 1
[error] Total time: 84 s, completed Sep 18, 2014 5:39:21 PM

Can we fix this?

Thanks, Paul

elmer-garduno commented 10 years ago

I think this is the way it is intended to work, as this file will be read from the distributed file system whenever the tasks are executed on spark and the shell replacement won't work there.

On Thu, Sep 18, 2014, 2:42 PM Paul Grosu notifications@github.com wrote:

Hi @elmer-garduno https://github.com/elmer-garduno,

I cannot run it like this:

sbt "run --client-secrets ~/me/gg_client_secrets/client_secrets_java.json --spark-master local[4]"

and thus need to write the full path:

$ sbt "run --client-secrets /home/pgrosu/me/gg_client_secrets/client_secrets_java.json --spark-master local[4]"

Otherwise I get the following error:

$ sbt "run --client-secrets ~/me/gg_client_secrets/client_secrets_java.json --spark-master local[4]" Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256M; support was removed in 8.0 [info] Loading project definition from /home/pgrosu/me/gg_spark/spark-examples/project [info] Set current project to googlegenomics-spark-examples (in build file:/home/pgrosu/me/gg_spark/spark-examples/)

Multiple main classes detected, select one to run:

[1] com.google.cloud.genomics.spark.examples.SearchVariantsExampleKlotho [2] com.google.cloud.genomics.spark.examples.SearchVariantsExampleBRCA1 [3] com.google.cloud.genomics.spark.examples.VariantsSource [4] com.google.cloud.genomics.spark.examples.VariantsPcaDriver [5] com.google.cloud.genomics.spark.examples.SearchReadsExample1 [6] com.google.cloud.genomics.spark.examples.SearchReadsExample2 [7] com.google.cloud.genomics.spark.examples.SearchReadsExample3 [8] com.google.cloud.genomics.spark.examples.SearchReadsExample4

Enter number: 5

[info] Running com.google.cloud.genomics.spark.examples.SearchReadsExample1 --client-secrets ~/me/gg_client_secrets/client_secrets_java.json --spark-master local[4] 14/09/18 17:39:18 INFO spark.SecurityManager: Changing view acls to: pgrosu 14/09/18 17:39:18 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(pgrosu) 14/09/18 17:39:18 INFO slf4j.Slf4jLogger: Slf4jLogger started 14/09/18 17:39:18 INFO Remoting: Starting remoting 14/09/18 17:39:19 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@eofe4.cm.cluster:55192] 14/09/18 17:39:19 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@eofe4.cm.cluster:55192] 14/09/18 17:39:19 INFO spark.SparkEnv: Registering MapOutputTracker 14/09/18 17:39:19 INFO spark.SparkEnv: Registering BlockManagerMaster 14/09/18 17:39:19 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local-20140918173919-6eb9 14/09/18 17:39:19 INFO storage.MemoryStore: MemoryStore started with capacity 819.3 MB. 14/09/18 17:39:19 INFO network.ConnectionManager: Bound socket to port 44255 with id = ConnectionManagerId(eofe4.cm.cluster,44255) 14/09/18 17:39:19 INFO storage.BlockManagerMaster: Trying to register BlockManager 14/09/18 17:39:19 INFO storage.BlockManagerInfo: Registering block manager eofe4.cm.cluster:44255 with 819.3 MB RAM 14/09/18 17:39:19 INFO storage.BlockManagerMaster: Registered BlockManager 14/09/18 17:39:19 INFO spark.HttpServer: Starting HTTP Server 14/09/18 17:39:19 INFO server.Server: jetty-8.1.14.v20131031 14/09/18 17:39:19 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:42008 14/09/18 17:39:19 INFO broadcast.HttpBroadcast: Broadcast server started at http://172.16.0.247:42008 14/09/18 17:39:19 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-6348baec-029f-4db2-bfaf-b870e0b65269 14/09/18 17:39:19 INFO spark.HttpServer: Starting HTTP Server 14/09/18 17:39:19 INFO server.Server: jetty-8.1.14.v20131031 14/09/18 17:39:19 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:40599 14/09/18 17:39:20 INFO server.Server: jetty-8.1.14.v20131031 14/09/18 17:39:20 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040 14/09/18 17:39:20 INFO ui.SparkUI: Started SparkUI at http://eofe4.cm.cluster:4040 14/09/18 17:39:20 ERROR executor.Executor: Exception in task ID 0 java.io.FileNotFoundException: ~/me/gg_client_secrets/client_secrets_java.json (No such file or directory) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.(FileInputStream.java:131) at com.google.cloud.genomics.Client$.apply(Client.scala:47) at com.google.cloud.genomics.spark.examples.rdd.ReadsRDD.compute(ReadsRDD.scala:82) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77) at org.apache.spark.rdd.RDD.iterator(RDD.scala:227) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) at org.apache.spark.scheduler.Task.run(Task.scala:51) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 14/09/18 17:39:20 WARN scheduler.TaskSetManager: Lost TID 0 (task 0.0:0) 14/09/18 17:39:20 WARN scheduler.TaskSetManager: Loss was due to java.io.FileNotFoundException java.io.FileNotFoundException: ~/me/gg_client_secrets/client_secrets_java.json (No such file or directory) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.(FileInputStream.java:131) at com.google.cloud.genomics.Client$.apply(Client.scala:47) at com.google.cloud.genomics.spark.examples.rdd.ReadsRDD.compute(ReadsRDD.scala:82) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77) at org.apache.spark.rdd.RDD.iterator(RDD.scala:227) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) at org.apache.spark.scheduler.Task.run(Task.scala:51) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 14/09/18 17:39:20 ERROR scheduler.TaskSetManager: Task 0.0:0 failed 1 times; aborting job error org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:0 failed 1 times, most recent failure: Exception failure in TID 0 on host localhost: java.io.FileNotFoundException: ~/me/gg_client_secrets/client_secrets_java.json (No such file or directory) [error] java.io.FileInputStream.open(Native Method) [error] java.io.FileInputStream.(FileInputStream.java:131) [error] com.google.cloud.genomics.Client$.apply(Client.scala:47) [error] com.google.cloud.genomics.spark.examples.rdd.ReadsRDD.compute(ReadsRDD.scala:82) [error] org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) [error] org.apache.spark.rdd.RDD.iterator(RDD.scala:229) [error] org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34) [error] org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) [error] org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77) [error] org.apache.spark.rdd.RDD.iterator(RDD.scala:227) [error] org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) [error] org.apache.spark.scheduler.Task.run(Task.scala:51) [error] org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) [error] java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [error] java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [error] java.lang.Thread.run(Thread.java:745) [error] Driver stacktrace: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:0 failed 1 times, most recent failure: Exception failure in TID 0 on host localhost: java.io.FileNotFoundException: ~/me/gg_client_secrets/client_secrets_java.json (No such file or directory) java.io.FileInputStream.open(Native Method) java.io.FileInputStream.(FileInputStream.java:131) com.google.cloud.genomics.Client$.apply(Client.scala:47) com.google.cloud.genomics.spark.examples.rdd.ReadsRDD.compute(ReadsRDD.scala:82) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.rdd.RDD.iterator(RDD.scala:229) org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77) org.apache.spark.rdd.RDD.iterator(RDD.scala:227) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) org.apache.spark.scheduler.Task.run(Task.scala:51) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) java.lang.Thread.run(Thread.java:745) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1049) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [trace] Stack trace suppressed: run last *:run for the full output. 14/09/18 17:39:20 ERROR spark.ContextCleaner: Error in cleaning thread java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:142) at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:117) at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:115) at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:115) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1160) at org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:114) at org.apache.spark.ContextCleaner$$anon$3.run(ContextCleaner.scala:65) 14/09/18 17:39:21 ERROR util.Utils: Uncaught exception in thread SparkListenerBus java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) at java.util.concurrent.Semaphore.acquire(Semaphore.java:312) at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:48) at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47) at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1160) at org.apache.spark.scheduler.LiveListenerBus$$anon$1.run(LiveListenerBus.scala:46) java.lang.RuntimeException: Nonzero exit code: 1 at scala.sys.package$.error(package.scala:27) [trace] Stack trace suppressed: run last compile:run for the full output. error Nonzero exit code: 1 [error] Total time: 84 s, completed Sep 18, 2014 5:39:21 PM

Can we fix this?

Thanks, Paul

— Reply to this email directly or view it on GitHub https://github.com/googlegenomics/spark-examples/issues/19.

elmer-garduno commented 10 years ago

Add some documentation on this matter.

pgrosu commented 10 years ago

Hi Elmer,

Okay, I fixed this. Just changed the following lines in Client.scala:

 def apply(applicationName: String, clientSecretsFile: String): Client = {
   val secrets = GoogleClientSecrets.load(jsonFactory,
        new InputStreamReader(new FileInputStream(new File(clientSecretsFile))))

To the following and it will work with tilde:

 def apply(applicationName: String, clientSecretsFile: String): Client = {

    val path_expanded_clientSecretsFile = clientSecretsFile.replace("~",System.getProperty("user.home"))

    val secrets = GoogleClientSecrets.load(jsonFactory,
      new InputStreamReader(new FileInputStream(new File(path_expanded_clientSecretsFile))))

Paul

elmer-garduno commented 10 years ago

This sounds reasonable as long as we don't expect the same behavior on the cluster. Besides that we are planing on a new approach to broadcast the authentication token to the workers instead of passing the client secrets file.

On Wed Sep 24 2014 at 11:00:38 PM Paul Grosu notifications@github.com wrote:

Hi Elmer,

Okay, I fixed this. Just changed the following lines in Client.scala:

def apply(applicationName: String, clientSecretsFile: String): Client = { val secrets = GoogleClientSecrets.load(jsonFactory, new InputStreamReader(new FileInputStream(new File(clientSecretsFile))))

To the following and it will work with tilde:

def apply(applicationName: String, clientSecretsFile: String): Client = {

val path_expanded_clientSecretsFile = clientSecretsFile.replace("~",System.getProperty("user.home"))

val secrets = GoogleClientSecrets.load(jsonFactory,
  new InputStreamReader(new FileInputStream(new File(path_expanded_clientSecretsFile))))

Paul

— Reply to this email directly or view it on GitHub https://github.com/googlegenomics/spark-examples/issues/19#issuecomment-56777061 .

pgrosu commented 10 years ago

Actually I ran it on a cluster and it worked fine. Curious of the new approach. At least we have more things to test out :)

elmer-garduno commented 10 years ago

That makes sense, can you submit a PR with that change please?

On Thu Sep 25 2014 at 12:06:26 PM Paul Grosu notifications@github.com wrote:

Actually I ran it on a cluster and it worked fine. Curious of the new approach. At least we have more things to test out :)

— Reply to this email directly or view it on GitHub https://github.com/googlegenomics/spark-examples/issues/19#issuecomment-56867692 .

pgrosu commented 10 years ago

I can't do a PR since I can't sign the CLA, but I can post stuff. Just take anything you find useful in what I post, and feel free to use it in any PR or other code :)