yahoo / CaffeOnSpark

Distributed deep learning on Hadoop and Spark clusters.
Apache License 2.0
1.27k stars 357 forks source link

An issue when using the docker. #281

Closed GoodJoey closed 7 years ago

GoodJoey commented 7 years ago

Seems when i restart the docker image, the data in the hadoop are all gone, so i need to re-get the train data every time i start a new image.

And i can train mnist successfully using the CPU version (docker), but when i use the GPU version it always have exceptions, like "Max number of executor failures (3) reached"