Open 1292765944 opened 7 years ago
I guess that is an issue of Caffe's parallel training code. It is doing synchronize SGD.
What is your results? It is expected to have difference because of randomness. But as long as it is within acceptable range (e.g. 77.*), it should be fine.
@weiliu89 I'm just reproducing your code. The accuracy is just right. But I just expect to know the running time in your machine when training SSD_300*300 in PASCAL_VOC 2007 trainval + 2010 trainval in 120000 iters? Thank you!
@1292765944 hi, in caffe, if you use 2 GPU, they will wait for another to exchange the param, you can search the details
Dear Wei Liu: Recently I have two problems with the code.
gpus = "0,1"
. However, in training, the GPU-util of these two gpus are only 100% and 0% alternately. It seems two gpus are not processing in parallel. The total time to train SSD 300300 on pascal voc 2007 trainval + 2012 trainval with 120000 iters costs me nearly 48 hours just by rough estimate. So what the problem can it be? How much time do you need in training such a network?Another question is that does it always reproduce 77.2% map on PASCAL VOC 2007 test under the default setting? I get slightly different results in same training setting. Is it because of the sampling strategy?
Looking forward to your early reply, Thanks!