yahoo / CaffeOnSpark

Distributed deep learning on Hadoop and Spark clusters.
Apache License 2.0
1.27k stars 358 forks source link

How to remove the Validation in CaffeOnSpark? #273

Closed flyfrommiwang closed 6 years ago

flyfrommiwang commented 6 years ago

@junshi15 @anfeng

hello,everyone I'm a newer for CaffeOnSpark. Now,i want to remove the Validation in CaffeOnSpark. I have encountered some problems

1、When I remove the test_iter and test_interval in the solve.prototxt .i get this erroe :

###################################################################### INFO CaffeProcessor: Start augmeentation for train in CaffeProcessor StartThreads #

A fatal error has been detected by the Java Runtime Environment:

#

SIGSEGV (0xb) at pc=0x00007f34569bbad6, pid=1325[thread 139861169395456 also had an error], tid=0x00007f3489fec700

#

JRE version: Java(TM) SE Runtime Environment (8.0_121-b13) (build 1.8.0_121-b13)

Java VM: Java HotSpot(TM) 64-Bit Server VM (25.121-b13 mixed mode linux-amd64 compressed oops)

Problematic frame:

C [libcaffedistri.so+0x45ad6] CaffeNet::getValidationOutputBlobNames()+0x26

#######################################################################

what should I do to remove the validation in CaffeOnSpark?

2、when i adjust the number of the test_iter and test_interval in the solve.txt, the net receive the different number of input (Mat data ) . i want to know how does these parameters affect the number of iterations?

Thanks!

junshi15 commented 6 years ago

1) test_iter and test_interval set both to zero. 2) not sure about this.

flyfrommiwang commented 6 years ago

thank you for your answer. what should I do with the train.prototxt ? remove the test layer ?

thank you

junshi15 commented 6 years ago

don't touch train.prototxt. your validation path will not run if you set both test_iter and test_interval to zero.

flyfrommiwang commented 6 years ago

thank you , above problem has been solved but, I found some other questions this Queue is used to send data to Caffe,I have modified the code to send R-FCN data,but the picture and label mismatch(picture came 2 later than the label) then ,i find this code: `val Free: ArrayBlockingQueue[T] = new ArrayBlockingQueue[T] (2)

val Full: ArrayBlockingQueue[T] = new ArrayBlockingQueue[T] (2)`

i find the delayed length will be increase if i expand size of ArrayBlockingQueue[T] () (if the size is 3,the picture will be delayed 3)

thank you

junshi15 commented 6 years ago

You lost sync between labels and images. The QueuePair structure bundles them. The String is the label and the FloatBlob is for the image. Did you not do that in your modified code?

flyfrommiwang commented 6 years ago

thank you, we've solved the problem. Thank you very much

GoodJoey commented 6 years ago

@flyfrommiwang i encountered the same issue, can i have your email or wechat? so we keep in touch?

flyfrommiwang commented 6 years ago

@GoodJoey , of course , my WeCaht number is 15700191841. please remark "github" when you touch me