Seems when i restart the docker image, the data in the hadoop are all gone, so i need to re-get the train data every time i start a new image.
And i can train mnist successfully using the CPU version (docker), but when i use the GPU version it always have exceptions, like "Max number of executor failures (3) reached"
Seems when i restart the docker image, the data in the hadoop are all gone, so i need to re-get the train data every time i start a new image.
And i can train mnist successfully using the CPU version (docker), but when i use the GPU version it always have exceptions, like "Max number of executor failures (3) reached"