Closed baoruxiao closed 7 years ago
Hmm.. !
Before trying step7, do a cd $CAFFE_ON_SPARK
. Maybe that is the issue.
I did the steps now (created a new image and everything) and it works fine for me.
Thanks @arundasan91 , the error pops out when I ran 'Step 8'--spark-submit. I've tried 'cd' to CAFFE_ON_SPARK, but still get the same error. Can you run mnist training flawlessly?
Hi @arundasan91, I have reproduced the errors on different machines (cpu machine and machine with both cpu and gpu). I simple followed the instruction to build and run docker image/container and follow the Getstart_yarn....
Please show the commands that you run. Also, have you changed the data/lenet_memory_solver.prototxt
and data/lenet_memory_train_test.prototxt
files with the correct hdfs path ?
Please also make sure you have the datasets downloaded ( you should have already, please cross verify )
root@998ed7494366:/opt/CaffeOnSpark# hadoop fs -ls /projects/machine_learning/image_dataset/mnist_test_lmdb
17/04/12 16:06:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
-rw-r--r-- 1 root supergroup 10338304 2017-04-12 01:25 /projects/machine_learning/image_dataset/mnist_test_lmdb/data.mdb
-rw-r--r-- 1 root supergroup 8192 2017-04-12 01:25 /projects/machine_learning/image_dataset/mnist_test_lmdb/lock.mdb
Yes, I changed 'lenet_memory_solver.protxt', but not change 'lenet_memory_train_test.protxt' (Do I need to?)
Also, I have datasets in hdfs ready:
Following are my commands:
docker build -t caffeonspark:cpu standalone/cpu
docker run -it caffeonspark:cpu /etc/bootstrap.sh -bash
cd $CAFFE_ON_SPARK
hadoop fs -mkdir -p /projects/machine_learning/image_dataset
${CAFFE_ON_SPARK}/scripts/setup-mnist.sh
hadoop fs -put -f ${CAFFE_ON_SPARK}/data/mnist_*_lmdb hdfs:/projects/machine_learning/image_dataset/
vim data/lenet_memory_solver.protxt # change mode from GPU to CPU
Please change the source location in lenet_memory_train_test
to the hdfs path. For example:
source_class: "com.yahoo.ml.caffe.LMDB"
memory_data_param {
source: "hdfs:/projects/machine_learning/image_dataset/mnist_train_lmdb"
batch_size: 64
channels: 1
height: 28
width: 28
share_in_parallel: false
}
Do this in test source also.
Yes, this is the problem! Thanks!! and I will suggest to have this add to docker README. I will close this issue.
Awesome. Please do. Please close the issue once you are confident.
Build standalone cpu docker images, and run following:
docker run -it caffeonspark:cpu /etc/bootstrap.sh -bash
Followed 'GetStarted_yarn Step 7', and get following error:
Following are the "env":
Does anyone has same error when running docker image? I'm really new to Spark, Yarn and Hadoop.