Open jason-dai opened 5 years ago
This is the Resnet-50 training on ImageNet, not sure if it's sufficient or what additional functions are required if not ?
Do you have the details (node#, hyper-parameters, etc.)?
Yes, below link contains the parameters that can reproduce the result.
where you can find
spark-submit \
--verbose \
--master spark://xxx.xxx.xxx.xxx:xxxx \
--driver-memory 200g \
--conf "spark.serializer=org.apache.spark.serializer.JavaSerializer" \
--conf "spark.network.timeout=1000000" \
--executor-memory 200g \
--executor-cores 32 \
--total-executor-cores 2048 \
--class com.intel.analytics.bigdl.models.resnet.TrainImageNet \
dist/lib/bigdl-VERSION-jar-with-dependencies.jar \
-f hdfs://xxx.xxx.xxx.xxx:xxxx/imagenet \
--batchSize 8192 --nEpochs 90 --learningRate 0.1 --warmupEpoch 5 \
--maxLr 3.2 --cache /cache --depth 50 --classes 1000
The pictures are raw images (not resized), we trained on 64 nodes with above Hyper Parameters and got 76.12% Top1 accuracy
Is it based on MKL-DNN backend?
No, it's based on MKL-BLAS, but the time to train has been reduced to ~40 hours
OK - then we need an end-to-end pipeline for large-scale ImageNet ResNet50 training (16 or 32 nodes) using MKL-DNN backend :smiley:
Need end-to-end pipeline for large-scale ImageNet ResNet50 training (16 or 32 nodes) using MKL-DNN backend