open-mmlab / mmdetection

OpenMMLab Detection Toolbox and Benchmark
https://mmdetection.readthedocs.io
Apache License 2.0
29.24k stars 9.4k forks source link

why train loss is too big #1081

Closed fanqie03 closed 5 years ago

fanqie03 commented 5 years ago

I use default config about configs/faster_rcnn_r50_fpn_1x.py, and train.py parameters are configs/faster_rcnn_r50_fpn_1x.py

In the Epoch [1][1000/58633], the training loss becomes very big. Is this normal? Why?

and the log is that

/home/mf/anaconda3/envs/open-mmlab/bin/python /home/mf/w_public/mmdetection/tools/train.py configs/faster_rcnn_r50_fpn_1x.py
2019-07-30 10:02:20,710 - INFO - Distributed training: False
2019-07-30 10:02:21,079 - INFO - load model from: modelzoo://resnet50
2019-07-30 10:02:21,425 - WARNING - unexpected key in source state_dict: fc.weight, fc.bias

missing keys in source state_dict: layer1.2.bn1.num_batches_tracked, layer3.5.bn2.num_batches_tracked, bn1.num_batches_tracked, layer3.1.bn1.num_batches_tracked, layer4.0.bn2.num_batches_tracked, layer2.1.bn3.num_batches_tracked, layer3.2.bn1.num_batches_tracked, layer1.0.bn1.num_batches_tracked, layer3.0.bn2.num_batches_tracked, layer2.3.bn3.num_batches_tracked, layer4.1.bn1.num_batches_tracked, layer1.1.bn2.num_batches_tracked, layer4.0.bn3.num_batches_tracked, layer4.1.bn2.num_batches_tracked, layer2.0.downsample.1.num_batches_tracked, layer3.1.bn2.num_batches_tracked, layer4.0.bn1.num_batches_tracked, layer3.5.bn3.num_batches_tracked, layer2.2.bn2.num_batches_tracked, layer1.1.bn1.num_batches_tracked, layer3.1.bn3.num_batches_tracked, layer4.1.bn3.num_batches_tracked, layer2.2.bn1.num_batches_tracked, layer1.2.bn2.num_batches_tracked, layer4.0.downsample.1.num_batches_tracked, layer3.3.bn2.num_batches_tracked, layer3.4.bn2.num_batches_tracked, layer4.2.bn1.num_batches_tracked, layer3.0.downsample.1.num_batches_tracked, layer3.2.bn2.num_batches_tracked, layer3.2.bn3.num_batches_tracked, layer3.3.bn3.num_batches_tracked, layer3.0.bn1.num_batches_tracked, layer2.3.bn1.num_batches_tracked, layer3.4.bn3.num_batches_tracked, layer2.3.bn2.num_batches_tracked, layer3.5.bn1.num_batches_tracked, layer2.0.bn3.num_batches_tracked, layer3.3.bn1.num_batches_tracked, layer1.1.bn3.num_batches_tracked, layer3.4.bn1.num_batches_tracked, layer2.0.bn1.num_batches_tracked, layer2.1.bn1.num_batches_tracked, layer2.2.bn3.num_batches_tracked, layer3.0.bn3.num_batches_tracked, layer1.0.bn3.num_batches_tracked, layer4.2.bn2.num_batches_tracked, layer1.0.downsample.1.num_batches_tracked, layer2.0.bn2.num_batches_tracked, layer4.2.bn3.num_batches_tracked, layer2.1.bn2.num_batches_tracked, layer1.2.bn3.num_batches_tracked, layer1.0.bn2.num_batches_tracked

loading annotations into memory...
Done (t=9.63s)
creating index...
index created!
2019-07-30 10:02:35,009 - INFO - Start running, host: mf@mf-System-Product-Name, work_dir: /home/mf/w_public/mmdetection/work_dirs/faster_rcnn_r50_fpn_1x
2019-07-30 10:02:35,009 - INFO - workflow: [('train', 1)], max: 12 epochs
2019-07-30 10:02:54,084 - INFO - Epoch [1][50/58633]    lr: 0.00797, eta: 3 days, 2:32:28, time: 0.381, data_time: 0.009, memory: 3791, loss_rpn_cls: 0.3375, loss_rpn_bbox: 0.0867, loss_cls: 0.6763, acc: 92.3008, loss_bbox: 0.1246, loss: 1.2251
2019-07-30 10:03:12,616 - INFO - Epoch [1][100/58633]   lr: 0.00931, eta: 3 days, 1:29:03, time: 0.371, data_time: 0.004, memory: 3791, loss_rpn_cls: 0.2140, loss_rpn_bbox: 0.0703, loss_cls: 0.5111, acc: 93.2188, loss_bbox: 0.1525, loss: 0.9479
2019-07-30 10:03:31,010 - INFO - Epoch [1][150/58633]   lr: 0.01064, eta: 3 days, 0:56:47, time: 0.368, data_time: 0.003, memory: 3791, loss_rpn_cls: 0.1666, loss_rpn_bbox: 0.0609, loss_cls: 0.5251, acc: 92.8848, loss_bbox: 0.1633, loss: 0.9159
2019-07-30 10:03:49,761 - INFO - Epoch [1][200/58633]   lr: 0.01197, eta: 3 days, 1:01:25, time: 0.375, data_time: 0.004, memory: 3791, loss_rpn_cls: 0.2267, loss_rpn_bbox: 0.0921, loss_cls: 0.6174, acc: 91.6387, loss_bbox: 0.1854, loss: 1.1217
2019-07-30 10:04:08,190 - INFO - Epoch [1][250/58633]   lr: 0.01331, eta: 3 days, 0:49:04, time: 0.369, data_time: 0.003, memory: 3791, loss_rpn_cls: 0.1873, loss_rpn_bbox: 0.0758, loss_cls: 0.6097, acc: 91.6562, loss_bbox: 0.1857, loss: 1.0585
2019-07-30 10:04:26,377 - INFO - Epoch [1][300/58633]   lr: 0.01464, eta: 3 days, 0:31:12, time: 0.364, data_time: 0.003, memory: 3791, loss_rpn_cls: 0.1744, loss_rpn_bbox: 0.0717, loss_cls: 0.5832, acc: 91.6348, loss_bbox: 0.1907, loss: 1.0200
2019-07-30 10:04:44,331 - INFO - Epoch [1][350/58633]   lr: 0.01597, eta: 3 days, 0:10:34, time: 0.359, data_time: 0.003, memory: 3791, loss_rpn_cls: 0.1876, loss_rpn_bbox: 0.0841, loss_cls: 0.5484, acc: 91.5840, loss_bbox: 0.1910, loss: 1.0112
2019-07-30 10:05:02,978 - INFO - Epoch [1][400/58633]   lr: 0.01731, eta: 3 days, 0:15:18, time: 0.373, data_time: 0.004, memory: 3791, loss_rpn_cls: 0.1606, loss_rpn_bbox: 0.0639, loss_cls: 0.6050, acc: 92.0977, loss_bbox: 0.1788, loss: 1.0083
2019-07-30 10:05:21,395 - INFO - Epoch [1][450/58633]   lr: 0.01864, eta: 3 days, 0:12:57, time: 0.368, data_time: 0.003, memory: 3791, loss_rpn_cls: 0.2056, loss_rpn_bbox: 0.0768, loss_cls: 0.6062, acc: 91.5117, loss_bbox: 0.1879, loss: 1.0766
2019-07-30 10:05:39,837 - INFO - Epoch [1][500/58633]   lr: 0.01997, eta: 3 days, 0:11:36, time: 0.369, data_time: 0.004, memory: 3791, loss_rpn_cls: 0.1487, loss_rpn_bbox: 0.0768, loss_cls: 0.5943, acc: 91.7441, loss_bbox: 0.1892, loss: 1.0090
2019-07-30 10:05:58,293 - INFO - Epoch [1][550/58633]   lr: 0.02000, eta: 3 days, 0:10:43, time: 0.369, data_time: 0.004, memory: 3791, loss_rpn_cls: 0.3145, loss_rpn_bbox: 0.1089, loss_cls: 0.4854, acc: 93.7539, loss_bbox: 0.1304, loss: 1.0391
2019-07-30 10:06:16,786 - INFO - Epoch [1][600/58633]   lr: 0.02000, eta: 3 days, 0:10:40, time: 0.370, data_time: 0.004, memory: 3791, loss_rpn_cls: 0.2104, loss_rpn_bbox: 0.0980, loss_cls: 0.5118, acc: 93.2090, loss_bbox: 0.1509, loss: 0.9711
2019-07-30 10:06:35,303 - INFO - Epoch [1][650/58633]   lr: 0.02000, eta: 3 days, 0:11:00, time: 0.370, data_time: 0.003, memory: 3791, loss_rpn_cls: 0.2558, loss_rpn_bbox: 0.1247, loss_cls: 0.5771, acc: 91.3262, loss_bbox: 0.1899, loss: 1.1476
2019-07-30 10:06:54,683 - INFO - Epoch [1][700/58633]   lr: 0.02000, eta: 3 days, 0:25:41, time: 0.388, data_time: 0.004, memory: 3791, loss_rpn_cls: 0.2355, loss_rpn_bbox: 0.0966, loss_cls: 0.4322, acc: 93.9688, loss_bbox: 0.1319, loss: 0.8962
2019-07-30 10:07:13,302 - INFO - Epoch [1][750/58633]   lr: 0.02000, eta: 3 days, 0:26:29, time: 0.372, data_time: 0.004, memory: 3791, loss_rpn_cls: 0.2131, loss_rpn_bbox: 0.0831, loss_cls: 0.4883, acc: 93.4316, loss_bbox: 0.1440, loss: 0.9285
2019-07-30 10:07:32,554 - INFO - Epoch [1][800/58633]   lr: 0.02000, eta: 3 days, 0:36:25, time: 0.385, data_time: 0.004, memory: 3791, loss_rpn_cls: 0.3003, loss_rpn_bbox: 0.1204, loss_cls: 0.5138, acc: 93.3008, loss_bbox: 0.1436, loss: 1.0781
2019-07-30 10:07:50,711 - INFO - Epoch [1][850/58633]   lr: 0.02000, eta: 3 days, 0:30:03, time: 0.363, data_time: 0.004, memory: 3791, loss_rpn_cls: 0.4217, loss_rpn_bbox: 0.2851, loss_cls: 0.8004, acc: 94.0859, loss_bbox: 0.1257, loss: 1.6328
2019-07-30 10:08:08,752 - INFO - Epoch [1][900/58633]   lr: 0.02000, eta: 3 days, 0:22:50, time: 0.361, data_time: 0.004, memory: 3791, loss_rpn_cls: 156.2920, loss_rpn_bbox: 62.4721, loss_cls: 364.4698, acc: 82.0712, loss_bbox: 41.8640, loss: 625.0979
2019-07-30 10:08:26,768 - INFO - Epoch [1][950/58633]   lr: 0.02000, eta: 3 days, 0:16:03, time: 0.360, data_time: 0.003, memory: 3791, loss_rpn_cls: 447235.8581, loss_rpn_bbox: 526061.7554, loss_cls: 4071407055.3989, acc: 80.6797, loss_bbox: 246750333.2189, loss: 4319130658.6691
2019-07-30 10:08:45,165 - INFO - Epoch [1][1000/58633]  lr: 0.02000, eta: 3 days, 0:14:23, time: 0.368, data_time: 0.004, memory: 3791, loss_rpn_cls: 663974819297698.0000, loss_rpn_bbox: 86506371132308.3125, loss_cls: 9498945394371206.0000, acc: 72.3992, loss_bbox: 333746037078607.0625, loss: 10583172569332912.0000
2019-07-30 10:09:03,294 - INFO - Epoch [1][1050/58633]  lr: 0.02000, eta: 3 days, 0:09:51, time: 0.363, data_time: 0.004, memory: 3791, loss_rpn_cls: 1364391539087953075634176.0000, loss_rpn_bbox: 567411899414833660952576.0000, loss_cls: 138728180119747246268874752.0000, acc: 91.6599, loss_bbox: 13771815597760854209069056.0000, loss: 154431800497686002255527936.0000
2019-07-30 10:09:21,490 - INFO - Epoch [1][1100/58633]  lr: 0.02000, eta: 3 days, 0:06:25, time: 0.364, data_time: 0.004, memory: 3791, loss_rpn_cls: 7248172595877238437576704.0000, loss_rpn_bbox: 2399984347958531225288704.0000, loss_cls: 749459289716433297625579520.0000, acc: 94.6113, loss_bbox: 77432702443750119213891584.0000, loss: 836540162127489252454301696.0000
2019-07-30 10:09:39,857 - INFO - Epoch [1][1150/58633]  lr: 0.02000, eta: 3 days, 0:05:00, time: 0.367, data_time: 0.003, memory: 3791, loss_rpn_cls: 7019707082215567669592064.0000, loss_rpn_bbox: 4737551173594944490176512.0000, loss_cls: 952746356550775618484043776.0000, acc: 94.1113, loss_bbox: 116985685205953382291865600.0000, loss: 1081489303707954757133926400.0000
2019-07-30 10:09:58,116 - INFO - Epoch [1][1200/58633]  lr: 0.02000, eta: 3 days, 0:02:37, time: 0.365, data_time: 0.003, memory: 3791, loss_rpn_cls: 6623643306337326952611840.0000, loss_rpn_bbox: 1677773494959891533529088.0000, loss_cls: 676409084427291037360193536.0000, acc: 95.1016, loss_bbox: 61674887396537672514142208.0000, loss: 746385394081858182862340096.0000
2019-07-30 10:10:16,655 - INFO - Epoch [1][1250/58633]  lr: 0.02000, eta: 3 days, 0:03:02, time: 0.371, data_time: 0.004, memory: 3791, loss_rpn_cls: 6306572437338864673095680.0000, loss_rpn_bbox: 2397626350338094825209856.0000, loss_cls: 702858893330790020814995456.0000, acc: 94.9922, loss_bbox: 85713648393600203787075584.0000, loss: 797276741770739209455271936.0000
2019-07-30 10:10:35,082 - INFO - Epoch [1][1300/58633]  lr: 0.02000, eta: 3 days, 0:02:22, time: 0.369, data_time: 0.004, memory: 3791, loss_rpn_cls: 5784541178443221308538880.0000, loss_rpn_bbox: 1592911890236547304259584.0000, loss_cls: 584144961218907285242249216.0000, acc: 94.7910, loss_bbox: 60132916379517915439824896.0000, loss: 651655322485019431535640576.0000
2019-07-30 10:10:53,984 - INFO - Epoch [1][1350/58633]  lr: 0.02000, eta: 3 days, 0:05:51, time: 0.378, data_time: 0.004, memory: 3791, loss_rpn_cls: 5342794156158588918169600.0000, loss_rpn_bbox: 1858728942494471866548224.0000, loss_cls: 592920503840435445530886144.0000, acc: 94.7520, loss_bbox: 69530009335973618376507392.0000, loss: 669652035893432084856832000.0000
2019-07-30 10:11:13,073 - INFO - Epoch [1][1400/58633]  lr: 0.02000, eta: 3 days, 0:10:38, time: 0.382, data_time: 0.004, memory: 3791, loss_rpn_cls: 5165773008838793266987008.0000, loss_rpn_bbox: 2311574336324449930838016.0000, loss_cls: 596068163900160765641883648.0000, acc: 94.6641, loss_bbox: 61864457022218112606404608.0000, loss: 665409970260547009163296768.0000
2019-07-30 10:11:31,742 - INFO - Epoch [1][1450/58633]  lr: 0.02000, eta: 3 days, 0:11:41, time: 0.373, data_time: 0.004, memory: 3791, loss_rpn_cls: 4610083048248850272747520.0000, loss_rpn_bbox: 1432003406626812493561856.0000, loss_cls: 548534009977692903822065664.0000, acc: 95.3594, loss_bbox: 61887055758006783571918848.0000, loss: 616463152732647734380593152.0000
2019-07-30 10:11:51,226 - INFO - Epoch [1][1500/58633]  lr: 0.02000, eta: 3 days, 0:18:59, time: 0.390, data_time: 0.004, memory: 3791, loss_rpn_cls: 4342793929019683183263744.0000, loss_rpn_bbox: 2127836828204648078770176.0000, loss_cls: 443916903405240911975677952.0000, acc: 95.3457, loss_bbox: 47187610007356827836088320.0000, loss: 497575150226156379118239744.0000
2019-07-30 10:12:11,386 - INFO - Epoch [1][1550/58633]  lr: 0.02000, eta: 3 days, 0:30:54, time: 0.403, data_time: 0.004, memory: 3791, loss_rpn_cls: 3947261021612846834778112.0000, loss_rpn_bbox: 1148280921352944960405504.0000, loss_cls: 456843004510417720971362304.0000, acc: 95.1367, loss_bbox: 43518018316285994678091776.0000, loss: 505456564082218576423944192.0000
2019-07-30 10:12:31,639 - INFO - Epoch [1][1600/58633]  lr: 0.02000, eta: 3 days, 0:42:44, time: 0.405, data_time: 0.004, memory: 3791, loss_rpn_cls: 3707260822292568108695552.0000, loss_rpn_bbox: 1797451293467601050533888.0000, loss_cls: 439920061639958484962246656.0000, acc: 94.9512, loss_bbox: 48417321447056923657502720.0000, loss: 493842091228754245505253376.0000
2019-07-30 10:12:50,501 - INFO - Epoch [1][1650/58633]  lr: 0.02000, eta: 3 days, 0:43:58, time: 0.377, data_time: 0.004, memory: 3791, loss_rpn_cls: 3512799281492788139524096.0000, loss_rpn_bbox: 1518530066902682387349504.0000, loss_cls: 435772218125044191227543552.0000, acc: 94.8496, loss_bbox: 40495116691692944713842688.0000, loss: 481298668451964109665075200.0000
2019-07-30 10:13:09,015 - INFO - Epoch [1][1700/58633]  lr: 0.02000, eta: 3 days, 0:42:42, time: 0.370, data_time: 0.004, memory: 3791, loss_rpn_cls: 3284510314701118226038784.0000, loss_rpn_bbox: 1162567653717180499886080.0000, loss_cls: 405972334976007277457178624.0000, acc: 94.9414, loss_bbox: 44356791104946548898791424.0000, loss: 454776203957101249268023296.0000
2019-07-30 10:13:27,367 - INFO - Epoch [1][1750/58633]  lr: 0.02000, eta: 3 days, 0:40:25, time: 0.367, data_time: 0.003, memory: 3791, loss_rpn_cls: 3066270079157628303310848.0000, loss_rpn_bbox: 887112520764302662041600.0000, loss_cls: 347684722754085661123280896.0000, acc: 94.7969, loss_bbox: 37238435013324089461833728.0000, loss: 388876538665704419208724480.0000
2019-07-30 10:13:46,028 - INFO - Epoch [1][1800/58633]  lr: 0.02000, eta: 3 days, 0:40:15, time: 0.373, data_time: 0.004, memory: 3791, loss_rpn_cls: 2904379300085843231768576.0000, loss_rpn_bbox: 1112800091037589653422080.0000, loss_cls: 320129707916297712938516480.0000, acc: 95.2461, loss_bbox: 34943331210680193794441216.0000, loss: 359090218761948872486944768.0000
2019-07-30 10:14:05,340 - INFO - Epoch [1][1850/58633]  lr: 0.02000, eta: 3 days, 0:44:12, time: 0.386, data_time: 0.004, memory: 3791, loss_rpn_cls: 2566878010625097048522752.0000, loss_rpn_bbox: 981360457758289661263872.0000, loss_cls: 371481038657838346076684288.0000, acc: 94.1621, loss_bbox: 37869240806814503221067776.0000, loss: 412898520449593731773890560.0000
2019-07-30 10:14:24,521 - INFO - Epoch [1][1900/58633]  lr: 0.02000, eta: 3 days, 0:47:06, time: 0.384, data_time: 0.004, memory: 3791, loss_rpn_cls: 2506677724270411290509312.0000, loss_rpn_bbox: 810762719964234931765248.0000, loss_cls: 297558388428070848821198848.0000, acc: 94.8594, loss_bbox: 29505186815108458238443520.0000, loss: 330381013052977505187659776.0000
2019-07-30 10:14:44,188 - INFO - Epoch [1][1950/58633]  lr: 0.02000, eta: 3 days, 0:52:46, time: 0.393, data_time: 0.004, memory: 3791, loss_rpn_cls: 2327605172188252492267520.0000, loss_rpn_bbox: 944301412305724663398400.0000, loss_cls: 263073636518943062465970176.0000, acc: 94.9668, loss_bbox: 32365094862387531321704448.0000, loss: 298710635381363799664623616.0000
2019-07-30 10:15:03,766 - INFO - Epoch [1][2000/58633]  lr: 0.02000, eta: 3 days, 0:57:36, time: 0.392, data_time: 0.004, memory: 3791, loss_rpn_cls: 2120199917124528168239104.0000, loss_rpn_bbox: 569833194926640730210304.0000, loss_cls: 315955675236003999192711168.0000, acc: 94.1309, loss_bbox: 37569526119027021277298688.0000, loss: 356215231348066288970760192.0000
2019-07-30 10:15:24,328 - INFO - Epoch [1][2050/58633]  lr: 0.02000, eta: 3 days, 1:07:49, time: 0.411, data_time: 0.004, memory: 3791, loss_rpn_cls: 2019840936335403697307648.0000, loss_rpn_bbox: 746494062373447116259328.0000, loss_cls: 211094436197562531918643200.0000, acc: 94.5684, loss_bbox: 20937678238341660638445568.0000, loss: 234798447591566279381614592.0000
hellock commented 5 years ago

Please check the important note in GETTING_STARTED.md.

marcosly commented 5 years ago

I use default config about configs/faster_rcnn_r50_fpn_1x.py, and train.py parameters are configs/faster_rcnn_r50_fpn_1x.py

In the Epoch [1][1000/58633], the training loss becomes very big. Is this normal? Why?

and the log is that

/home/mf/anaconda3/envs/open-mmlab/bin/python /home/mf/w_public/mmdetection/tools/train.py configs/faster_rcnn_r50_fpn_1x.py
2019-07-30 10:02:20,710 - INFO - Distributed training: False
2019-07-30 10:02:21,079 - INFO - load model from: modelzoo://resnet50
2019-07-30 10:02:21,425 - WARNING - unexpected key in source state_dict: fc.weight, fc.bias

missing keys in source state_dict: layer1.2.bn1.num_batches_tracked, layer3.5.bn2.num_batches_tracked, bn1.num_batches_tracked, layer3.1.bn1.num_batches_tracked, layer4.0.bn2.num_batches_tracked, layer2.1.bn3.num_batches_tracked, layer3.2.bn1.num_batches_tracked, layer1.0.bn1.num_batches_tracked, layer3.0.bn2.num_batches_tracked, layer2.3.bn3.num_batches_tracked, layer4.1.bn1.num_batches_tracked, layer1.1.bn2.num_batches_tracked, layer4.0.bn3.num_batches_tracked, layer4.1.bn2.num_batches_tracked, layer2.0.downsample.1.num_batches_tracked, layer3.1.bn2.num_batches_tracked, layer4.0.bn1.num_batches_tracked, layer3.5.bn3.num_batches_tracked, layer2.2.bn2.num_batches_tracked, layer1.1.bn1.num_batches_tracked, layer3.1.bn3.num_batches_tracked, layer4.1.bn3.num_batches_tracked, layer2.2.bn1.num_batches_tracked, layer1.2.bn2.num_batches_tracked, layer4.0.downsample.1.num_batches_tracked, layer3.3.bn2.num_batches_tracked, layer3.4.bn2.num_batches_tracked, layer4.2.bn1.num_batches_tracked, layer3.0.downsample.1.num_batches_tracked, layer3.2.bn2.num_batches_tracked, layer3.2.bn3.num_batches_tracked, layer3.3.bn3.num_batches_tracked, layer3.0.bn1.num_batches_tracked, layer2.3.bn1.num_batches_tracked, layer3.4.bn3.num_batches_tracked, layer2.3.bn2.num_batches_tracked, layer3.5.bn1.num_batches_tracked, layer2.0.bn3.num_batches_tracked, layer3.3.bn1.num_batches_tracked, layer1.1.bn3.num_batches_tracked, layer3.4.bn1.num_batches_tracked, layer2.0.bn1.num_batches_tracked, layer2.1.bn1.num_batches_tracked, layer2.2.bn3.num_batches_tracked, layer3.0.bn3.num_batches_tracked, layer1.0.bn3.num_batches_tracked, layer4.2.bn2.num_batches_tracked, layer1.0.downsample.1.num_batches_tracked, layer2.0.bn2.num_batches_tracked, layer4.2.bn3.num_batches_tracked, layer2.1.bn2.num_batches_tracked, layer1.2.bn3.num_batches_tracked, layer1.0.bn2.num_batches_tracked

loading annotations into memory...
Done (t=9.63s)
creating index...
index created!
2019-07-30 10:02:35,009 - INFO - Start running, host: mf@mf-System-Product-Name, work_dir: /home/mf/w_public/mmdetection/work_dirs/faster_rcnn_r50_fpn_1x
2019-07-30 10:02:35,009 - INFO - workflow: [('train', 1)], max: 12 epochs
2019-07-30 10:02:54,084 - INFO - Epoch [1][50/58633]  lr: 0.00797, eta: 3 days, 2:32:28, time: 0.381, data_time: 0.009, memory: 3791, loss_rpn_cls: 0.3375, loss_rpn_bbox: 0.0867, loss_cls: 0.6763, acc: 92.3008, loss_bbox: 0.1246, loss: 1.2251
2019-07-30 10:03:12,616 - INFO - Epoch [1][100/58633] lr: 0.00931, eta: 3 days, 1:29:03, time: 0.371, data_time: 0.004, memory: 3791, loss_rpn_cls: 0.2140, loss_rpn_bbox: 0.0703, loss_cls: 0.5111, acc: 93.2188, loss_bbox: 0.1525, loss: 0.9479
2019-07-30 10:03:31,010 - INFO - Epoch [1][150/58633] lr: 0.01064, eta: 3 days, 0:56:47, time: 0.368, data_time: 0.003, memory: 3791, loss_rpn_cls: 0.1666, loss_rpn_bbox: 0.0609, loss_cls: 0.5251, acc: 92.8848, loss_bbox: 0.1633, loss: 0.9159
2019-07-30 10:03:49,761 - INFO - Epoch [1][200/58633] lr: 0.01197, eta: 3 days, 1:01:25, time: 0.375, data_time: 0.004, memory: 3791, loss_rpn_cls: 0.2267, loss_rpn_bbox: 0.0921, loss_cls: 0.6174, acc: 91.6387, loss_bbox: 0.1854, loss: 1.1217
2019-07-30 10:04:08,190 - INFO - Epoch [1][250/58633] lr: 0.01331, eta: 3 days, 0:49:04, time: 0.369, data_time: 0.003, memory: 3791, loss_rpn_cls: 0.1873, loss_rpn_bbox: 0.0758, loss_cls: 0.6097, acc: 91.6562, loss_bbox: 0.1857, loss: 1.0585
2019-07-30 10:04:26,377 - INFO - Epoch [1][300/58633] lr: 0.01464, eta: 3 days, 0:31:12, time: 0.364, data_time: 0.003, memory: 3791, loss_rpn_cls: 0.1744, loss_rpn_bbox: 0.0717, loss_cls: 0.5832, acc: 91.6348, loss_bbox: 0.1907, loss: 1.0200
2019-07-30 10:04:44,331 - INFO - Epoch [1][350/58633] lr: 0.01597, eta: 3 days, 0:10:34, time: 0.359, data_time: 0.003, memory: 3791, loss_rpn_cls: 0.1876, loss_rpn_bbox: 0.0841, loss_cls: 0.5484, acc: 91.5840, loss_bbox: 0.1910, loss: 1.0112
2019-07-30 10:05:02,978 - INFO - Epoch [1][400/58633] lr: 0.01731, eta: 3 days, 0:15:18, time: 0.373, data_time: 0.004, memory: 3791, loss_rpn_cls: 0.1606, loss_rpn_bbox: 0.0639, loss_cls: 0.6050, acc: 92.0977, loss_bbox: 0.1788, loss: 1.0083
2019-07-30 10:05:21,395 - INFO - Epoch [1][450/58633] lr: 0.01864, eta: 3 days, 0:12:57, time: 0.368, data_time: 0.003, memory: 3791, loss_rpn_cls: 0.2056, loss_rpn_bbox: 0.0768, loss_cls: 0.6062, acc: 91.5117, loss_bbox: 0.1879, loss: 1.0766
2019-07-30 10:05:39,837 - INFO - Epoch [1][500/58633] lr: 0.01997, eta: 3 days, 0:11:36, time: 0.369, data_time: 0.004, memory: 3791, loss_rpn_cls: 0.1487, loss_rpn_bbox: 0.0768, loss_cls: 0.5943, acc: 91.7441, loss_bbox: 0.1892, loss: 1.0090
2019-07-30 10:05:58,293 - INFO - Epoch [1][550/58633] lr: 0.02000, eta: 3 days, 0:10:43, time: 0.369, data_time: 0.004, memory: 3791, loss_rpn_cls: 0.3145, loss_rpn_bbox: 0.1089, loss_cls: 0.4854, acc: 93.7539, loss_bbox: 0.1304, loss: 1.0391
2019-07-30 10:06:16,786 - INFO - Epoch [1][600/58633] lr: 0.02000, eta: 3 days, 0:10:40, time: 0.370, data_time: 0.004, memory: 3791, loss_rpn_cls: 0.2104, loss_rpn_bbox: 0.0980, loss_cls: 0.5118, acc: 93.2090, loss_bbox: 0.1509, loss: 0.9711
2019-07-30 10:06:35,303 - INFO - Epoch [1][650/58633] lr: 0.02000, eta: 3 days, 0:11:00, time: 0.370, data_time: 0.003, memory: 3791, loss_rpn_cls: 0.2558, loss_rpn_bbox: 0.1247, loss_cls: 0.5771, acc: 91.3262, loss_bbox: 0.1899, loss: 1.1476
2019-07-30 10:06:54,683 - INFO - Epoch [1][700/58633] lr: 0.02000, eta: 3 days, 0:25:41, time: 0.388, data_time: 0.004, memory: 3791, loss_rpn_cls: 0.2355, loss_rpn_bbox: 0.0966, loss_cls: 0.4322, acc: 93.9688, loss_bbox: 0.1319, loss: 0.8962
2019-07-30 10:07:13,302 - INFO - Epoch [1][750/58633] lr: 0.02000, eta: 3 days, 0:26:29, time: 0.372, data_time: 0.004, memory: 3791, loss_rpn_cls: 0.2131, loss_rpn_bbox: 0.0831, loss_cls: 0.4883, acc: 93.4316, loss_bbox: 0.1440, loss: 0.9285
2019-07-30 10:07:32,554 - INFO - Epoch [1][800/58633] lr: 0.02000, eta: 3 days, 0:36:25, time: 0.385, data_time: 0.004, memory: 3791, loss_rpn_cls: 0.3003, loss_rpn_bbox: 0.1204, loss_cls: 0.5138, acc: 93.3008, loss_bbox: 0.1436, loss: 1.0781
2019-07-30 10:07:50,711 - INFO - Epoch [1][850/58633] lr: 0.02000, eta: 3 days, 0:30:03, time: 0.363, data_time: 0.004, memory: 3791, loss_rpn_cls: 0.4217, loss_rpn_bbox: 0.2851, loss_cls: 0.8004, acc: 94.0859, loss_bbox: 0.1257, loss: 1.6328
2019-07-30 10:08:08,752 - INFO - Epoch [1][900/58633] lr: 0.02000, eta: 3 days, 0:22:50, time: 0.361, data_time: 0.004, memory: 3791, loss_rpn_cls: 156.2920, loss_rpn_bbox: 62.4721, loss_cls: 364.4698, acc: 82.0712, loss_bbox: 41.8640, loss: 625.0979
2019-07-30 10:08:26,768 - INFO - Epoch [1][950/58633] lr: 0.02000, eta: 3 days, 0:16:03, time: 0.360, data_time: 0.003, memory: 3791, loss_rpn_cls: 447235.8581, loss_rpn_bbox: 526061.7554, loss_cls: 4071407055.3989, acc: 80.6797, loss_bbox: 246750333.2189, loss: 4319130658.6691
2019-07-30 10:08:45,165 - INFO - Epoch [1][1000/58633]    lr: 0.02000, eta: 3 days, 0:14:23, time: 0.368, data_time: 0.004, memory: 3791, loss_rpn_cls: 663974819297698.0000, loss_rpn_bbox: 86506371132308.3125, loss_cls: 9498945394371206.0000, acc: 72.3992, loss_bbox: 333746037078607.0625, loss: 10583172569332912.0000
2019-07-30 10:09:03,294 - INFO - Epoch [1][1050/58633]    lr: 0.02000, eta: 3 days, 0:09:51, time: 0.363, data_time: 0.004, memory: 3791, loss_rpn_cls: 1364391539087953075634176.0000, loss_rpn_bbox: 567411899414833660952576.0000, loss_cls: 138728180119747246268874752.0000, acc: 91.6599, loss_bbox: 13771815597760854209069056.0000, loss: 154431800497686002255527936.0000
2019-07-30 10:09:21,490 - INFO - Epoch [1][1100/58633]    lr: 0.02000, eta: 3 days, 0:06:25, time: 0.364, data_time: 0.004, memory: 3791, loss_rpn_cls: 7248172595877238437576704.0000, loss_rpn_bbox: 2399984347958531225288704.0000, loss_cls: 749459289716433297625579520.0000, acc: 94.6113, loss_bbox: 77432702443750119213891584.0000, loss: 836540162127489252454301696.0000
2019-07-30 10:09:39,857 - INFO - Epoch [1][1150/58633]    lr: 0.02000, eta: 3 days, 0:05:00, time: 0.367, data_time: 0.003, memory: 3791, loss_rpn_cls: 7019707082215567669592064.0000, loss_rpn_bbox: 4737551173594944490176512.0000, loss_cls: 952746356550775618484043776.0000, acc: 94.1113, loss_bbox: 116985685205953382291865600.0000, loss: 1081489303707954757133926400.0000
2019-07-30 10:09:58,116 - INFO - Epoch [1][1200/58633]    lr: 0.02000, eta: 3 days, 0:02:37, time: 0.365, data_time: 0.003, memory: 3791, loss_rpn_cls: 6623643306337326952611840.0000, loss_rpn_bbox: 1677773494959891533529088.0000, loss_cls: 676409084427291037360193536.0000, acc: 95.1016, loss_bbox: 61674887396537672514142208.0000, loss: 746385394081858182862340096.0000
2019-07-30 10:10:16,655 - INFO - Epoch [1][1250/58633]    lr: 0.02000, eta: 3 days, 0:03:02, time: 0.371, data_time: 0.004, memory: 3791, loss_rpn_cls: 6306572437338864673095680.0000, loss_rpn_bbox: 2397626350338094825209856.0000, loss_cls: 702858893330790020814995456.0000, acc: 94.9922, loss_bbox: 85713648393600203787075584.0000, loss: 797276741770739209455271936.0000
2019-07-30 10:10:35,082 - INFO - Epoch [1][1300/58633]    lr: 0.02000, eta: 3 days, 0:02:22, time: 0.369, data_time: 0.004, memory: 3791, loss_rpn_cls: 5784541178443221308538880.0000, loss_rpn_bbox: 1592911890236547304259584.0000, loss_cls: 584144961218907285242249216.0000, acc: 94.7910, loss_bbox: 60132916379517915439824896.0000, loss: 651655322485019431535640576.0000
2019-07-30 10:10:53,984 - INFO - Epoch [1][1350/58633]    lr: 0.02000, eta: 3 days, 0:05:51, time: 0.378, data_time: 0.004, memory: 3791, loss_rpn_cls: 5342794156158588918169600.0000, loss_rpn_bbox: 1858728942494471866548224.0000, loss_cls: 592920503840435445530886144.0000, acc: 94.7520, loss_bbox: 69530009335973618376507392.0000, loss: 669652035893432084856832000.0000
2019-07-30 10:11:13,073 - INFO - Epoch [1][1400/58633]    lr: 0.02000, eta: 3 days, 0:10:38, time: 0.382, data_time: 0.004, memory: 3791, loss_rpn_cls: 5165773008838793266987008.0000, loss_rpn_bbox: 2311574336324449930838016.0000, loss_cls: 596068163900160765641883648.0000, acc: 94.6641, loss_bbox: 61864457022218112606404608.0000, loss: 665409970260547009163296768.0000
2019-07-30 10:11:31,742 - INFO - Epoch [1][1450/58633]    lr: 0.02000, eta: 3 days, 0:11:41, time: 0.373, data_time: 0.004, memory: 3791, loss_rpn_cls: 4610083048248850272747520.0000, loss_rpn_bbox: 1432003406626812493561856.0000, loss_cls: 548534009977692903822065664.0000, acc: 95.3594, loss_bbox: 61887055758006783571918848.0000, loss: 616463152732647734380593152.0000
2019-07-30 10:11:51,226 - INFO - Epoch [1][1500/58633]    lr: 0.02000, eta: 3 days, 0:18:59, time: 0.390, data_time: 0.004, memory: 3791, loss_rpn_cls: 4342793929019683183263744.0000, loss_rpn_bbox: 2127836828204648078770176.0000, loss_cls: 443916903405240911975677952.0000, acc: 95.3457, loss_bbox: 47187610007356827836088320.0000, loss: 497575150226156379118239744.0000
2019-07-30 10:12:11,386 - INFO - Epoch [1][1550/58633]    lr: 0.02000, eta: 3 days, 0:30:54, time: 0.403, data_time: 0.004, memory: 3791, loss_rpn_cls: 3947261021612846834778112.0000, loss_rpn_bbox: 1148280921352944960405504.0000, loss_cls: 456843004510417720971362304.0000, acc: 95.1367, loss_bbox: 43518018316285994678091776.0000, loss: 505456564082218576423944192.0000
2019-07-30 10:12:31,639 - INFO - Epoch [1][1600/58633]    lr: 0.02000, eta: 3 days, 0:42:44, time: 0.405, data_time: 0.004, memory: 3791, loss_rpn_cls: 3707260822292568108695552.0000, loss_rpn_bbox: 1797451293467601050533888.0000, loss_cls: 439920061639958484962246656.0000, acc: 94.9512, loss_bbox: 48417321447056923657502720.0000, loss: 493842091228754245505253376.0000
2019-07-30 10:12:50,501 - INFO - Epoch [1][1650/58633]    lr: 0.02000, eta: 3 days, 0:43:58, time: 0.377, data_time: 0.004, memory: 3791, loss_rpn_cls: 3512799281492788139524096.0000, loss_rpn_bbox: 1518530066902682387349504.0000, loss_cls: 435772218125044191227543552.0000, acc: 94.8496, loss_bbox: 40495116691692944713842688.0000, loss: 481298668451964109665075200.0000
2019-07-30 10:13:09,015 - INFO - Epoch [1][1700/58633]    lr: 0.02000, eta: 3 days, 0:42:42, time: 0.370, data_time: 0.004, memory: 3791, loss_rpn_cls: 3284510314701118226038784.0000, loss_rpn_bbox: 1162567653717180499886080.0000, loss_cls: 405972334976007277457178624.0000, acc: 94.9414, loss_bbox: 44356791104946548898791424.0000, loss: 454776203957101249268023296.0000
2019-07-30 10:13:27,367 - INFO - Epoch [1][1750/58633]    lr: 0.02000, eta: 3 days, 0:40:25, time: 0.367, data_time: 0.003, memory: 3791, loss_rpn_cls: 3066270079157628303310848.0000, loss_rpn_bbox: 887112520764302662041600.0000, loss_cls: 347684722754085661123280896.0000, acc: 94.7969, loss_bbox: 37238435013324089461833728.0000, loss: 388876538665704419208724480.0000
2019-07-30 10:13:46,028 - INFO - Epoch [1][1800/58633]    lr: 0.02000, eta: 3 days, 0:40:15, time: 0.373, data_time: 0.004, memory: 3791, loss_rpn_cls: 2904379300085843231768576.0000, loss_rpn_bbox: 1112800091037589653422080.0000, loss_cls: 320129707916297712938516480.0000, acc: 95.2461, loss_bbox: 34943331210680193794441216.0000, loss: 359090218761948872486944768.0000
2019-07-30 10:14:05,340 - INFO - Epoch [1][1850/58633]    lr: 0.02000, eta: 3 days, 0:44:12, time: 0.386, data_time: 0.004, memory: 3791, loss_rpn_cls: 2566878010625097048522752.0000, loss_rpn_bbox: 981360457758289661263872.0000, loss_cls: 371481038657838346076684288.0000, acc: 94.1621, loss_bbox: 37869240806814503221067776.0000, loss: 412898520449593731773890560.0000
2019-07-30 10:14:24,521 - INFO - Epoch [1][1900/58633]    lr: 0.02000, eta: 3 days, 0:47:06, time: 0.384, data_time: 0.004, memory: 3791, loss_rpn_cls: 2506677724270411290509312.0000, loss_rpn_bbox: 810762719964234931765248.0000, loss_cls: 297558388428070848821198848.0000, acc: 94.8594, loss_bbox: 29505186815108458238443520.0000, loss: 330381013052977505187659776.0000
2019-07-30 10:14:44,188 - INFO - Epoch [1][1950/58633]    lr: 0.02000, eta: 3 days, 0:52:46, time: 0.393, data_time: 0.004, memory: 3791, loss_rpn_cls: 2327605172188252492267520.0000, loss_rpn_bbox: 944301412305724663398400.0000, loss_cls: 263073636518943062465970176.0000, acc: 94.9668, loss_bbox: 32365094862387531321704448.0000, loss: 298710635381363799664623616.0000
2019-07-30 10:15:03,766 - INFO - Epoch [1][2000/58633]    lr: 0.02000, eta: 3 days, 0:57:36, time: 0.392, data_time: 0.004, memory: 3791, loss_rpn_cls: 2120199917124528168239104.0000, loss_rpn_bbox: 569833194926640730210304.0000, loss_cls: 315955675236003999192711168.0000, acc: 94.1309, loss_bbox: 37569526119027021277298688.0000, loss: 356215231348066288970760192.0000
2019-07-30 10:15:24,328 - INFO - Epoch [1][2050/58633]    lr: 0.02000, eta: 3 days, 1:07:49, time: 0.411, data_time: 0.004, memory: 3791, loss_rpn_cls: 2019840936335403697307648.0000, loss_rpn_bbox: 746494062373447116259328.0000, loss_cls: 211094436197562531918643200.0000, acc: 94.5684, loss_bbox: 20937678238341660638445568.0000, loss: 234798447591566279381614592.0000

I have the same problem, did you fix it?

fanqie03 commented 5 years ago

@marcosly I haven't continued yet.

hellock commented 5 years ago

Please read GETTING_STARTED.md.

Important: The default learning rate in config files is for 8 GPUs and 2 img/gpu (batch size = 82 = 16). According to the Linear Scaling Rule, you need to set the learning rate proportional to the batch size if you use different GPUs or images per GPU, e.g., lr=0.01 for 4 GPUs 2 img/gpu and lr=0.08 for 16 GPUs * 4 img/gpu.

fanqie03 commented 5 years ago

Thank you very much. Forgive my folly.