Open malia05 opened 7 years ago
Hello @malia05 ,
Question: Are you running CoS in a Mac ?
Since you are getting this:
Warning: Local jar /home/inf/spark-1.6.1-bin-hadoop2.4/CaffeOnSpark/caffe-grid/target/caffe-grid-0.1-SNAPSHOT-jar-with-dependencies.jar does not exist
I am pretty sure that you have some wrongly assigned PATH
's. From the logs I understand that you installed CoS into /home/inf/CaffeOnSpark
but the line above say's it's looking for CaffeOnSpark/caffe-grid/target/caffe-grid-0.1-SNAPSHOT-jar-with-dependencies.jar
file inside /home/inf/spark-1.6.1-bin-hadoop2.4/
folder which is wrong.
Please check your CAFFE_ON_SPARK
, DYLD_LIBRARY_PATH
variables. Since you are using DYLD_LIBRARY_PATH
I am assuming that your machine is a Mac. Please verify that you are doing this:
pushd ${CAFFE_ON_SPARK}/data
rm -rf ${CAFFE_ON_SPARK}/mnist_lenet.model
rm -rf ${CAFFE_ON_SPARK}/lenet_features_result
spark-submit --master ${MASTER_URL} \
--files ${CAFFE_ON_SPARK}/data/lenet_memory_solver.prototxt,${CAFFE_ON_SPARK}/data/lenet_memory_train_test.prototxt \
--conf spark.cores.max=${TOTAL_CORES} \
--conf spark.task.cpus=${CORES_PER_WORKER} \
--conf spark.driver.extraLibraryPath="${DYLD_LIBRARY_PATH}" \
--conf spark.executorEnv.DYLD_LIBRARY_PATH="${DYLD_LIBRARY_PATH}" \
--class com.yahoo.ml.caffe.CaffeOnSpark \
${CAFFE_ON_SPARK}/caffe-grid/target/caffe-grid-0.1-SNAPSHOT-jar-with-dependencies.jar \
-train \
-features accuracy,loss -label label \
-conf lenet_memory_solver.prototxt \
-clusterSize ${SPARK_WORKER_INSTANCES} \
-devices 1 \
-connection ethernet \
-model file:${CAFFE_ON_SPARK}/mnist_lenet.model \
-output file:${CAFFE_ON_SPARK}/lenet_features_result
ls -l ${CAFFE_ON_SPARK}/mnist_lenet.model
cat ${CAFFE_ON_SPARK}/lenet_features_result/*
I successfully installed spark-1.6.1-bin-hadoop2.4, CaffeOnSpark and mnist dataset, then I Adjusted ${CAFFE_ON_SPARK}/data/lenet_memory_train_test.prototxt to use absolute paths, such as. "file:/home/inf/CaffeOnSpark/caffe-public/examples/mnist/mnist_train_lmdb/" "file:/home/inf/CaffeOnSpark/caffe-public/examples/mnist/mnist_trest_lmdb/" My problem is how train DNN network using CaffeOnSpark with 2 Spark executors with Ethernet connection? is it necessary to configure a file of Spark "spark-env" with CaffeOnSpark? I submitted in mode standalone to train DNN using Mnist data, I used this instruction under Spark: ./bin/spark-submit --master local[4] --files ${CAFFE_ON_SPARK}/data/lenet_memory_solver.prototxt,${CAFFE_ON_SPARK}/data/lenet_memory_train_test.prototxt --conf spark.driver.extraLibraryPath="${DYLD_LIBRARY_PATH}" --conf spark.executorEnv.DYLD_LIBRARY_PATH="${DYLD_LIBRARY_PATH}" --class com.yahoo.ml.caffe.CaffeOnSpark CaffeOnSpark/caffe-grid/target/caffe-grid-0.1-SNAPSHOT-jar-with-dependencies.jar -train -features accuracy,loss -label label -conf lenet_memory_solver.prototxt -connection ethernet -model file:${CAFFE_ON_SPARK}/mnist_lenet.model -output file:${CAFFE_ON_SPARK}/lenet_features_result I get this message: Warning: Local jar /home/inf/spark-1.6.1-bin-hadoop2.4/CaffeOnSpark/caffe-grid/target/caffe-grid-0.1-SNAPSHOT-jar-with-dependencies.jar does not exist, skipping. java.lang.ClassNotFoundException: com.yahoo.ml.caffe.CaffeOnSpark at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.spark.util.Utils$.classForName(Utils.scala:174) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:689) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Plz, can any one clarify this problem and thanks so much...