pjreddie / darknet

Convolutional Neural Networks
http://pjreddie.com/darknet/
Other
25.85k stars 21.33k forks source link

Training on multiple GPUs - iteration number is not sequentially #113

Open matansag opened 7 years ago

matansag commented 7 years ago

Hi When i train on multiple gpus -gpus 0,1,2,3 i see that the iteration number rise not sequentially and there for backup every 100,1000 or 10000 doesn't happen. it misses the %1000==0

Thanks

Region Avg IOU: 0.454094, Class: 0.539861, Obj: 0.114575, No Obj: 0.006711, Avg Recall: 0.509091,  count: 55
5535: 30.146860, 29.296202 avg, 0.004000 rate, 62.480057 seconds, 1416960 images
Loaded: 0.000303 seconds
Region Avg IOU: 0.362384, Class: 0.510273, Obj: 0.159985, No Obj: 0.008064, Avg Recall: 0.333333,  count: 75
Region Avg IOU: 0.382254, Class: 0.569780, Obj: 0.187762, No Obj: 0.006942, Avg Recall: 0.405405,  count: 37
Region Avg IOU: 0.293618, Class: 0.329098, Obj: 0.172438, No Obj: 0.008295, Avg Recall: 0.342466,  count: 73
Region Avg IOU: 0.430122, Class: 0.630054, Obj: 0.095643, No Obj: 0.005807, Avg Recall: 0.472222,  count: 36
Region Avg IOU: 0.293003, Class: 0.374357, Obj: 0.139664, No Obj: 0.007799, Avg Recall: 0.246575,  count: 73
Region Avg IOU: 0.416864, Class: 0.227515, Obj: 0.113475, No Obj: 0.006115, Avg Recall: 0.487805,  count: 41
Region Avg IOU: 0.555408, Class: 0.554738, Obj: 0.226879, No Obj: 0.007803, Avg Recall: 0.588235,  count: 17
Region Avg IOU: 0.459013, Class: 0.381784, Obj: 0.193288, No Obj: 0.005824, Avg Recall: 0.580645,  count: 31
Region Avg IOU: 0.589257, Class: 0.380691, Obj: 0.230529, No Obj: 0.007240, Avg Recall: 0.666667,  count: 24
Region Avg IOU: 0.391813, Class: 0.417314, Obj: 0.187585, No Obj: 0.007195, Avg Recall: 0.423077,  count: 52
Region Avg IOU: 0.513660, Class: 0.592106, Obj: 0.166535, No Obj: 0.008167, Avg Recall: 0.589744,  count: 39
Region Avg IOU: 0.378107, Class: 0.282193, Obj: 0.098436, No Obj: 0.006287, Avg Recall: 0.375000,  count: 80
Region Avg IOU: 0.385749, Class: 0.187662, Obj: 0.083863, No Obj: 0.008231, Avg Recall: 0.340000,  count: 50
Region Avg IOU: 0.279301, Class: 0.319146, Obj: 0.152392, No Obj: 0.006699, Avg Recall: 0.217391,  count: 69
Region Avg IOU: 0.408984, Class: 0.460680, Obj: 0.156198, No Obj: 0.007666, Avg Recall: 0.418182,  count: 55
Region Avg IOU: 0.339951, Class: 0.403575, Obj: 0.103020, No Obj: 0.007019, Avg Recall: 0.300000,  count: 110
Region Avg IOU: 0.319821, Class: 0.446927, Obj: 0.103471, No Obj: 0.007668, Avg Recall: 0.379310,  count: 58
Region Avg IOU: 0.518152, Class: 0.575077, Obj: 0.140920, No Obj: 0.008971, Avg Recall: 0.573529,  count: 68
Region Avg IOU: 0.492652, Class: 0.583876, Obj: 0.201243, No Obj: 0.009499, Avg Recall: 0.600000,  count: 45
Region Avg IOU: 0.583092, Class: 0.472818, Obj: 0.266019, No Obj: 0.008068, Avg Recall: 0.700000,  count: 20
Region Avg IOU: 0.408801, Class: 0.391544, Obj: 0.161156, No Obj: 0.006466, Avg Recall: 0.381818,  count: 55
Region Avg IOU: 0.440448, Class: 0.384089, Obj: 0.223808, No Obj: 0.007991, Avg Recall: 0.484848,  count: 33
Region Avg IOU: 0.535046, Class: 0.424881, Obj: 0.136086, No Obj: 0.007484, Avg Recall: 0.617647,  count: 34
Region Avg IOU: 0.518109, Class: 0.599688, Obj: 0.150311, No Obj: 0.007573, Avg Recall: 0.621622,  count: 37
Region Avg IOU: 0.433767, Class: 0.374441, Obj: 0.200943, No Obj: 0.008492, Avg Recall: 0.531915,  count: 47
Region Avg IOU: 0.517354, Class: 0.553782, Obj: 0.171961, No Obj: 0.008288, Avg Recall: 0.591837,  count: 49
Region Avg IOU: 0.620490, Class: 0.471173, Obj: 0.196483, No Obj: 0.008271, Avg Recall: 0.714286,  count: 14
Region Avg IOU: 0.347727, Class: 0.449756, Obj: 0.143466, No Obj: 0.007169, Avg Recall: 0.345455,  count: 55
Region Avg IOU: 0.397453, Class: 0.389798, Obj: 0.112621, No Obj: 0.006925, Avg Recall: 0.390625,  count: 64
Region Avg IOU: 0.422600, Class: 0.556410, Obj: 0.153910, No Obj: 0.007374, Avg Recall: 0.465116,  count: 43
Region Avg IOU: 0.292901, Class: 0.307642, Obj: 0.103662, No Obj: 0.006220, Avg Recall: 0.266667,  count: 45
Region Avg IOU: 0.445993, Class: 0.462168, Obj: 0.168339, No Obj: 0.007389, Avg Recall: 0.438596,  count: 57
Syncing... Done!
5548: 34.672623, 29.833843 avg, 0.004000 rate, 67.664970 seconds, 1420288 images
Loaded: 0.000039 seconds
Region Avg IOU: 0.500222, Class: 0.576081, Obj: 0.257318, No Obj: 0.007791, Avg Recall: 0.521739,  count: 23
zhouxiaoxu commented 7 years ago

yes ,i met the same problem

niskov commented 6 years ago

Any updates on this issue?

ahsan856jalal commented 6 years ago

it is fine as iterations are distributed between different threads and small oscillations won't dd much to change training.

zijinoier commented 5 years ago

same problem, any Update?