yahoo / CaffeOnSpark

Distributed deep learning on Hadoop and Spark clusters.
Apache License 2.0
1.27k stars 358 forks source link

How can I trans my own data into DF? #232

Open kceil opened 7 years ago

kceil commented 7 years ago

Now I have already build CaffeOnSpark successfully , my own data is CIFAR-10(256*256) LMDB-9.8GB,How can I trans this into DF? Thank you!

arundasan91 commented 7 years ago

Hello @kceil ,

It is mentioned here: https://github.com/yahoo/CaffeOnSpark/wiki/GetStarted_EC2

pushd ${CAFFE_ON_SPARK}/data

hadoop fs -rm -r -f ${CAFFE_ON_SPARK}/data/mnist_train_dataframe
spark-submit --master ${MASTER_URL} \
         --conf spark.cores.max=${TOTAL_CORES} \
             --conf spark.driver.extraLibraryPath="${LD_LIBRARY_PATH}" \
             --conf spark.executorEnv.LD_LIBRARY_PATH="${LD_LIBRARY_PATH}" \
             --class com.yahoo.ml.caffe.tools.LMDB2DataFrame \
             ${CAFFE_ON_SPARK}/caffe-grid/target/caffe-grid-0.1-SNAPSHOT-jar-with-dependencies.jar \
             -imageRoot file:${CAFFE_ON_SPARK}/data/mnist_train_lmdb \
             -lmdb_partitions ${TOTAL_CORES} \
             -outputFormat parquet \
             -output ${CAFFE_ON_SPARK}/data/mnist_train_dataframe

hadoop fs -rm -r -f ${CAFFE_ON_SPARK}/data/mnist_test_dataframe
spark-submit --master ${MASTER_URL} \
         --conf spark.cores.max=${TOTAL_CORES} \
             --conf spark.driver.extraLibraryPath="${LD_LIBRARY_PATH}" \
             --conf spark.executorEnv.LD_LIBRARY_PATH="${LD_LIBRARY_PATH}" \
             --class com.yahoo.ml.caffe.tools.LMDB2DataFrame \
             ${CAFFE_ON_SPARK}/caffe-grid/target/caffe-grid-0.1-SNAPSHOT-jar-with-dependencies.jar \
             -imageRoot file:${CAFFE_ON_SPARK}/data/mnist_test_lmdb \
             -lmdb_partitions ${TOTAL_CORES} \
             -outputFormat parquet \
             -output ${CAFFE_ON_SPARK}/data/mnist_test_dataframe

You could change the MNIST details to CIFAR10 easily.

Thanks, Arun