yahoo / CaffeOnSpark

Distributed deep learning on Hadoop and Spark clusters.
Apache License 2.0
1.27k stars 358 forks source link

GetStarted_yarn can't produce various snapshots(mnist_lenet_iter_*.caffemodel/solverstate). #242

Open guyang88 opened 7 years ago

guyang88 commented 7 years ago

@junshi15 @davglass @gyehuda @javadba @pcnudde @mriduljain I ran caffeonspark on yarn grid following GetStarted_yarn ,but I can't get various snapshots such as (mnist_lenetiter*.caffemodel/solverstate),and I get mnist.model file and mnist_features_results folder which contain correct results . I tried to set snapshots path in lenet_memory_solve.prototxt , it didn't work . Do I set path error?

junshi15 commented 7 years ago

If you use the protofiles given in the repo, then the max iteration is at 2000 and the snap shot interval is 5000.

https://github.com/yahoo/CaffeOnSpark/blob/master/data/lenet_memory_solver.prototxt#L18-L20

You won't get snapshot since the training is finished by then.

guyang88 commented 7 years ago

yeah, I got it . Thanks.