yahoo / CaffeOnSpark

Distributed deep learning on Hadoop and Spark clusters.
Apache License 2.0
1.27k stars 358 forks source link

can't find lenet_memory_solver.prototxt #260

Open RickLee26 opened 7 years ago

RickLee26 commented 7 years ago

I compiled it on x64 Ubuntu14.04, environments are: scala-2.11.7 spark-2.1.0 hadoop-2.6.5 opencv-3.0 And when I tried to run mnist sample, message showed they cannot find the solver prototxt file! script: spark-submit \ --class com.yahoo.ml.caffe.CaffeOnSpark \ --master local[2] \ --conf spark.driver.extraLibraryPath="${LD_LIBRARY_PATH}" \ --conf spark.executorEnv.LD_LIBRARY_PATH="${LD_LIBRARY_PATH}" \ --files ${CAFFE_ON_SPARK}/data/lenet_memory_solver.prototxt,${CAFFE_ON_SPARK}/data/lenet_memory_train_test.prototxt \ caffe-grid/target/caffe-grid-0.1-SNAPSHOT-jar-with-dependencies.jar \ -train \ -features accuracy,loss -label label \ -conf lenet_memory_solver.prototxt \ -clusterSize 1 \ -devices 1 \ -connection ethernet \ -model file:${CAFFE_ON_SPARK}/mnist_lenet.model \ -output file:${CAFFE_ON_SPARK}/lenet_features_result

this is what I got: pcer@pcer-u14:~/CaffeOnSpark$ ./test.sh 17/05/31 01:04:28 INFO spark.SparkContext: Running Spark version 2.1.0 17/05/31 01:04:29 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/05/31 01:04:29 WARN util.Utils: Your hostname, pcer-u14 resolves to a loopback address: 127.0.1.1; using 192.168.199.147 instead (on interface eth0) 17/05/31 01:04:29 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to another address 17/05/31 01:04:29 INFO spark.SecurityManager: Changing view acls to: pcer 17/05/31 01:04:29 INFO spark.SecurityManager: Changing modify acls to: pcer 17/05/31 01:04:29 INFO spark.SecurityManager: Changing view acls groups to: 17/05/31 01:04:29 INFO spark.SecurityManager: Changing modify acls groups to: 17/05/31 01:04:29 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(pcer); groups with view permissions: Set(); users with modify permissions: Set(pcer); groups with modify permissions: Set() 17/05/31 01:04:29 INFO util.Utils: Successfully started service 'sparkDriver' on port 39224. 17/05/31 01:04:29 INFO spark.SparkEnv: Registering MapOutputTracker 17/05/31 01:04:29 INFO spark.SparkEnv: Registering BlockManagerMaster 17/05/31 01:04:29 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 17/05/31 01:04:29 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 17/05/31 01:04:29 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-2d70934b-0d88-4a0d-a22b-5aeb58225944 17/05/31 01:04:29 INFO memory.MemoryStore: MemoryStore started with capacity 366.3 MB 17/05/31 01:04:29 INFO spark.SparkEnv: Registering OutputCommitCoordinator 17/05/31 01:04:29 INFO util.log: Logging initialized @1616ms 17/05/31 01:04:29 INFO server.Server: jetty-9.2.z-SNAPSHOT 17/05/31 01:04:29 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@523424b5{/jobs,null,AVAILABLE} 17/05/31 01:04:29 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2baa8d82{/jobs/json,null,AVAILABLE} 17/05/31 01:04:29 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@319dead1{/jobs/job,null,AVAILABLE} 17/05/31 01:04:29 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@791cbf87{/jobs/job/json,null,AVAILABLE} 17/05/31 01:04:29 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@a7e2d9d{/stages,null,AVAILABLE} 17/05/31 01:04:29 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@754777cd{/stages/json,null,AVAILABLE} 17/05/31 01:04:29 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2b52c0d6{/stages/stage,null,AVAILABLE} 17/05/31 01:04:29 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@372ea2bc{/stages/stage/json,null,AVAILABLE} 17/05/31 01:04:29 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4cc76301{/stages/pool,null,AVAILABLE} 17/05/31 01:04:29 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2f08c4b{/stages/pool/json,null,AVAILABLE} 17/05/31 01:04:29 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3f19b8b3{/storage,null,AVAILABLE} 17/05/31 01:04:29 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7de0c6ae{/storage/json,null,AVAILABLE} 17/05/31 01:04:29 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@a486d78{/storage/rdd,null,AVAILABLE} 17/05/31 01:04:29 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@cdc3aae{/storage/rdd/json,null,AVAILABLE} 17/05/31 01:04:29 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7ef2d7a6{/environment,null,AVAILABLE} 17/05/31 01:04:29 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5dcbb60{/environment/json,null,AVAILABLE} 17/05/31 01:04:29 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4c36250e{/executors,null,AVAILABLE} 17/05/31 01:04:29 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@21526f6c{/executors/json,null,AVAILABLE} 17/05/31 01:04:29 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@49f5c307{/executors/threadDump,null,AVAILABLE} 17/05/31 01:04:29 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@299266e2{/executors/threadDump/json,null,AVAILABLE} 17/05/31 01:04:29 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5471388b{/static,null,AVAILABLE} 17/05/31 01:04:29 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@66ea1466{/,null,AVAILABLE} 17/05/31 01:04:29 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1601e47{/api,null,AVAILABLE} 17/05/31 01:04:29 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3bffddff{/jobs/job/kill,null,AVAILABLE} 17/05/31 01:04:29 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@66971f6b{/stages/stage/kill,null,AVAILABLE} 17/05/31 01:04:29 INFO server.ServerConnector: Started ServerConnector@748a0931{HTTP/1.1}{0.0.0.0:4040} 17/05/31 01:04:29 INFO server.Server: Started @1761ms 17/05/31 01:04:29 INFO util.Utils: Successfully started service 'SparkUI' on port 4040. 17/05/31 01:04:29 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.199.147:4040 17/05/31 01:04:29 INFO spark.SparkContext: Added JAR file:/home/pcer/CaffeOnSpark/caffe-grid/target/caffe-grid-0.1-SNAPSHOT-jar-with-dependencies.jar at spark://192.168.199.147:39224/jars/caffe-grid-0.1-SNAPSHOT-jar-with-dependencies.jar with timestamp 1496163869957 17/05/31 01:04:30 INFO spark.SparkContext: Added file file:/home/pcer/CaffeOnSpark/data/lenet_memory_solver.prototxt at file:/home/pcer/CaffeOnSpark/data/lenet_memory_solver.prototxt with timestamp 1496163870176 17/05/31 01:04:30 INFO util.Utils: Copying /home/pcer/CaffeOnSpark/data/lenet_memory_solver.prototxt to /tmp/spark-b9b16802-6b4c-4dec-8816-69bf982502cc/userFiles-0368788f-f141-4cf3-8c8e-f5ee50c3b7f5/lenet_memory_solver.prototxt 17/05/31 01:04:30 INFO spark.SparkContext: Added file file:/home/pcer/CaffeOnSpark/data/lenet_memory_train_test.prototxt at file:/home/pcer/CaffeOnSpark/data/lenet_memory_train_test.prototxt with timestamp 1496163870212 17/05/31 01:04:30 INFO util.Utils: Copying /home/pcer/CaffeOnSpark/data/lenet_memory_train_test.prototxt to /tmp/spark-b9b16802-6b4c-4dec-8816-69bf982502cc/userFiles-0368788f-f141-4cf3-8c8e-f5ee50c3b7f5/lenet_memory_train_test.prototxt 17/05/31 01:04:30 INFO executor.Executor: Starting executor ID driver on host localhost 17/05/31 01:04:30 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 46836. 17/05/31 01:04:30 INFO netty.NettyBlockTransferService: Server created on 192.168.199.147:46836 17/05/31 01:04:30 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 17/05/31 01:04:30 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.199.147, 46836, None) 17/05/31 01:04:30 INFO storage.BlockManagerMasterEndpoint: Registering block manager 192.168.199.147:46836 with 366.3 MB RAM, BlockManagerId(driver, 192.168.199.147, 46836, None) 17/05/31 01:04:30 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.199.147, 46836, None) 17/05/31 01:04:30 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.199.147, 46836, None) 17/05/31 01:04:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@12db3386{/metrics/json,null,AVAILABLE} Exception in thread "main" java.io.FileNotFoundException: lenet_memory_train_test.prototxt (No such file or directory) at java.io.FileInputStream.open0(Native Method) at java.io.FileInputStream.open(FileInputStream.java:195) at java.io.FileInputStream.<init>(FileInputStream.java:138) at java.io.FileInputStream.<init>(FileInputStream.java:93) at java.io.FileReader.<init>(FileReader.java:58) at com.yahoo.ml.jcaffe.Utils.GetNetParam(Utils.java:22) at com.yahoo.ml.caffe.Config.protoFile_$eq(Config.scala:71) at com.yahoo.ml.caffe.Config.<init>(Config.scala:439) at com.yahoo.ml.caffe.CaffeOnSpark$.main(CaffeOnSpark.scala:34) at com.yahoo.ml.caffe.CaffeOnSpark.main(CaffeOnSpark.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 17/05/31 01:04:30 INFO spark.SparkContext: Invoking stop() from shutdown hook 17/05/31 01:04:30 INFO server.ServerConnector: Stopped ServerConnector@748a0931{HTTP/1.1}{0.0.0.0:4040} 17/05/31 01:04:30 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@66971f6b{/stages/stage/kill,null,UNAVAILABLE} 17/05/31 01:04:30 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@3bffddff{/jobs/job/kill,null,UNAVAILABLE} 17/05/31 01:04:30 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@1601e47{/api,null,UNAVAILABLE} 17/05/31 01:04:30 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@66ea1466{/,null,UNAVAILABLE} 17/05/31 01:04:30 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@5471388b{/static,null,UNAVAILABLE} 17/05/31 01:04:30 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@299266e2{/executors/threadDump/json,null,UNAVAILABLE} 17/05/31 01:04:30 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@49f5c307{/executors/threadDump,null,UNAVAILABLE} 17/05/31 01:04:30 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@21526f6c{/executors/json,null,UNAVAILABLE} 17/05/31 01:04:30 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@4c36250e{/executors,null,UNAVAILABLE} 17/05/31 01:04:30 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@5dcbb60{/environment/json,null,UNAVAILABLE} 17/05/31 01:04:30 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@7ef2d7a6{/environment,null,UNAVAILABLE} 17/05/31 01:04:30 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@cdc3aae{/storage/rdd/json,null,UNAVAILABLE} 17/05/31 01:04:30 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@a486d78{/storage/rdd,null,UNAVAILABLE} 17/05/31 01:04:30 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@7de0c6ae{/storage/json,null,UNAVAILABLE} 17/05/31 01:04:30 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@3f19b8b3{/storage,null,UNAVAILABLE} 17/05/31 01:04:30 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@2f08c4b{/stages/pool/json,null,UNAVAILABLE} 17/05/31 01:04:30 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@4cc76301{/stages/pool,null,UNAVAILABLE} 17/05/31 01:04:30 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@372ea2bc{/stages/stage/json,null,UNAVAILABLE} 17/05/31 01:04:30 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@2b52c0d6{/stages/stage,null,UNAVAILABLE} 17/05/31 01:04:30 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@754777cd{/stages/json,null,UNAVAILABLE} 17/05/31 01:04:30 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@a7e2d9d{/stages,null,UNAVAILABLE} 17/05/31 01:04:30 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@791cbf87{/jobs/job/json,null,UNAVAILABLE} 17/05/31 01:04:30 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@319dead1{/jobs/job,null,UNAVAILABLE} 17/05/31 01:04:30 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@2baa8d82{/jobs/json,null,UNAVAILABLE} 17/05/31 01:04:30 INFO handler.ContextHandler: Stopped o.s.j.s.ServletContextHandler@523424b5{/jobs,null,UNAVAILABLE} 17/05/31 01:04:30 INFO ui.SparkUI: Stopped Spark web UI at http://192.168.199.147:4040 17/05/31 01:04:30 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 17/05/31 01:04:30 INFO memory.MemoryStore: MemoryStore cleared 17/05/31 01:04:30 INFO storage.BlockManager: BlockManager stopped 17/05/31 01:04:30 INFO storage.BlockManagerMaster: BlockManagerMaster stopped 17/05/31 01:04:30 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 17/05/31 01:04:30 INFO spark.SparkContext: Successfully stopped SparkContext 17/05/31 01:04:30 INFO util.ShutdownHookManager: Shutdown hook called 17/05/31 01:04:30 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-b9b16802-6b4c-4dec-8816-69bf982502cc pcer@pcer-u14:~/CaffeOnSpark$ scala -version Scala code runner version 2.11.7 -- Copyright 2002-2013, LAMP/EPFL pcer@pcer-u14:~/CaffeOnSpark$ and my directory structure: pcer@pcer-u14:~/CaffeOnSpark$ ll data/ total 100 drwxrwxr-x 5 pcer pcer 4096 5月 31 00:55 ./ drwxrwxr-x 10 pcer pcer 4096 5月 30 23:27 ../ -rw-rw-r-- 1 pcer pcer 5591 1月 20 20:29 bvlc_reference_net.prototxt -rw-rw-r-- 1 pcer pcer 283 1月 20 20:29 bvlc_reference_solver.prototxt -rw-rw-r-- 1 pcer pcer 5567 1月 20 20:29 caffenet_train_net.prototxt -rw-rw-r-- 1 pcer pcer 804 1月 20 20:29 cifar10_quick_solver.prototxt -rw-rw-r-- 1 pcer pcer 3340 1月 20 20:29 cifar10_quick_train_test.prototxt drwxrwxr-x 2 pcer pcer 4096 1月 20 20:29 images/ -rw-rw-r-- 1 pcer pcer 648 1月 20 20:29 lenet_cos_solver.prototxt -rw-rw-r-- 1 pcer pcer 2894 1月 20 20:29 lenet_cos_train_test.prototxt -rw-rw-r-- 1 pcer pcer 654 1月 20 20:29 lenet_dataframe_solver.prototxt -rw-rw-r-- 1 pcer pcer 2566 5月 30 22:57 lenet_dataframe_train_test.prototxt -rw-rw-r-- 1 pcer pcer 651 5月 30 22:57 lenet_memory_solver.prototxt -rw-rw-r-- 1 pcer pcer 2537 1月 20 20:29 lenet_memory_train_test.prototxt -rw-rw-r-- 1 pcer pcer 12314 1月 20 20:29 lrcn_cos.prototxt -rw-rw-r-- 1 pcer pcer 922 1月 20 20:29 lrcn_solver.prototxt -rw-rw-r-- 1 pcer pcer 967 1月 20 20:29 lrcn_word_to_preds.deploy.prototxt -rw-rw-r-- 1 pcer pcer 2832 1月 20 20:29 lstm_deploy.prototxt drwxr--r-- 2 pcer pcer 4096 5月 30 22:50 mnist_test_lmdb/ drwxr--r-- 2 pcer pcer 4096 5月 30 22:50 mnist_train_lmdb/ pcer@pcer-u14:~/CaffeOnSpark$ the spark-master and spark-slave I started before: pcer@pcer-u14:~/CaffeOnSpark$ jps 21476 Master 28026 Jps pcer@pcer-u14:~/CaffeOnSpark$

Does anyone here know what's going on with my pc?