jbwang1997 / OBBDetection

OBBDetection is an oriented object detection library, which is based on MMdetection.
Apache License 2.0
522 stars 112 forks source link

train loss is nan after 1 epoch using own dataset in dota format,model is oriented rcnn #104

Open MIXIAOXIN opened 2 years ago

MIXIAOXIN commented 2 years ago

logs 2021-12-25 12:55:30,796 - mmdet - INFO - load model from: torchvision://resnet50 2021-12-25 12:55:30,796 - mmdet - INFO - Use load_from_torchvision loader 2021-12-25 12:55:30,943 - mmdet - WARNING - The model and loaded state dict do not match exactly

unexpected key in source state_dict: fc.weight, fc.bias

Starting loading ROADMARK dataset information. Finishing loading ROADMARK, get 1112 iamges, using 0.327s. 2021-12-25 12:55:34,053 - mmdet - INFO - Start running, host: mxx@mxx, work_dir: /home/mxx/PycharmProjects/OBBDetection/roadmark-logs 2021-12-25 12:55:34,054 - mmdet - INFO - Hooks will be executed in the following order: before_run: (VERY_HIGH ) StepLrUpdaterHook
(NORMAL ) CheckpointHook
(VERY_LOW ) TextLoggerHook


before_train_epoch: (VERY_HIGH ) StepLrUpdaterHook
(LOW ) IterTimerHook
(VERY_LOW ) TextLoggerHook


before_train_iter: (VERY_HIGH ) StepLrUpdaterHook
(LOW ) IterTimerHook


after_train_iter: (ABOVE_NORMAL) OptimizerHook
(NORMAL ) CheckpointHook
(LOW ) IterTimerHook
(VERY_LOW ) TextLoggerHook


after_train_epoch: (NORMAL ) CheckpointHook
(VERY_LOW ) TextLoggerHook


before_val_epoch: (LOW ) IterTimerHook
(VERY_LOW ) TextLoggerHook


before_val_iter: (LOW ) IterTimerHook


after_val_iter: (LOW ) IterTimerHook


after_val_epoch: (VERY_LOW ) TextLoggerHook


2021-12-25 12:55:34,054 - mmdet - INFO - workflow: [('train', 1)], max: 12 epochs 2021-12-25 12:55:49,098 - mmdet - INFO - Epoch [1][50/556] lr: 4.945e-04, eta: 0:33:12, time: 0.301, data_time: 0.049, memory: 3698, loss_rpn_cls: 0.6306, loss_rpn_bbox: 0.4588, loss_cls: 0.7314, acc: 86.3770, loss_bbox: 0.0030, loss: 1.8238, grad_norm: 10.9180 2021-12-25 12:56:01,549 - mmdet - INFO - Epoch [1][100/556] lr: 9.940e-04, eta: 0:30:06, time: 0.249, data_time: 0.003, memory: 3698, loss_rpn_cls: 0.4070, loss_rpn_bbox: 0.3323, loss_cls: 0.1898, acc: 97.0762, loss_bbox: 0.0050, loss: 0.9341, grad_norm: 6.5154 2021-12-25 12:56:14,036 - mmdet - INFO - Epoch [1][150/556] lr: 1.494e-03, eta: 0:28:58, time: 0.250, data_time: 0.004, memory: 3698, loss_rpn_cls: 0.4137, loss_rpn_bbox: 0.4908, loss_cls: 0.2284, acc: 95.9062, loss_bbox: 0.0168, loss: 1.1498, grad_norm: 6.4609 2021-12-25 12:56:26,652 - mmdet - INFO - Epoch [1][200/556] lr: 1.993e-03, eta: 0:28:21, time: 0.252, data_time: 0.004, memory: 3698, loss_rpn_cls: 0.4033, loss_rpn_bbox: 0.5386, loss_cls: 0.2418, acc: 95.2031, loss_bbox: 0.0526, loss: 1.2363, grad_norm: 6.4062 2021-12-25 12:56:38,880 - mmdet - INFO - Epoch [1][250/556] lr: 2.493e-03, eta: 0:27:45, time: 0.245, data_time: 0.004, memory: 3698, loss_rpn_cls: 0.3336, loss_rpn_bbox: 0.3698, loss_cls: 0.2245, acc: 95.4883, loss_bbox: 0.0360, loss: 0.9639, grad_norm: 6.6003 2021-12-25 12:56:50,891 - mmdet - INFO - Epoch [1][300/556] lr: 2.992e-03, eta: 0:27:11, time: 0.240, data_time: 0.004, memory: 3698, loss_rpn_cls: 0.2802, loss_rpn_bbox: 0.2996, loss_cls: 0.1671, acc: 96.2656, loss_bbox: 0.0290, loss: 0.7758, grad_norm: 4.9532 2021-12-25 12:57:03,271 - mmdet - INFO - Epoch [1][350/556] lr: 3.492e-03, eta: 0:26:51, time: 0.248, data_time: 0.004, memory: 3698, loss_rpn_cls: 0.2747, loss_rpn_bbox: 0.4207, loss_cls: 0.1467, acc: 96.3301, loss_bbox: 0.0261, loss: 0.8681, grad_norm: 4.8048 2021-12-25 12:57:15,611 - mmdet - INFO - Epoch [1][400/556] lr: 3.991e-03, eta: 0:26:32, time: 0.247, data_time: 0.004, memory: 3698, loss_rpn_cls: 0.2249, loss_rpn_bbox: 0.3687, loss_cls: 0.1750, acc: 95.8945, loss_bbox: 0.0381, loss: 0.8066, grad_norm: 4.8320 2021-12-25 12:57:27,785 - mmdet - INFO - Epoch [1][450/556] lr: 4.491e-03, eta: 0:26:12, time: 0.243, data_time: 0.004, memory: 3698, loss_rpn_cls: 0.1747, loss_rpn_bbox: 0.3679, loss_cls: 0.1275, acc: 96.6934, loss_bbox: 0.0393, loss: 0.7094, grad_norm: 4.0705 2021-12-25 12:57:40,320 - mmdet - INFO - Epoch [1][500/556] lr: 4.990e-03, eta: 0:25:58, time: 0.251, data_time: 0.004, memory: 3698, loss_rpn_cls: 0.2156, loss_rpn_bbox: 0.4039, loss_cls: 0.1936, acc: 95.3008, loss_bbox: 0.0719, loss: 0.8850, grad_norm: 4.7913 2021-12-25 12:57:52,639 - mmdet - INFO - Epoch [1][550/556] lr: 5.000e-03, eta: 0:25:42, time: 0.246, data_time: 0.004, memory: 3698, loss_rpn_cls: 0.2036, loss_rpn_bbox: 0.2993, loss_cls: 0.1535, acc: 96.0078, loss_bbox: 0.0577, loss: 0.7142, grad_norm: 4.4044 2021-12-25 12:57:54,020 - mmdet - INFO - Saving checkpoint at 1 epochs 2021-12-25 12:58:08,898 - mmdet - INFO - Epoch [2][50/556] lr: 5.000e-03, eta: 0:25:31, time: 0.288, data_time: 0.049, memory: 3698, loss_rpn_cls: 0.2111, loss_rpn_bbox: 0.3309, loss_cls: 0.1283, acc: 97.0215, loss_bbox: 0.0505, loss: 0.7209, grad_norm: 3.7619 2021-12-25 12:58:21,253 - mmdet - INFO - Epoch [2][100/556] lr: 5.000e-03, eta: 0:25:16, time: 0.247, data_time: 0.004, memory: 3698, loss_rpn_cls: 0.1616, loss_rpn_bbox: 0.3156, loss_cls: 0.1485, acc: 96.2480, loss_bbox: 0.0681, loss: 0.6938, grad_norm: 3.4983 2021-12-25 12:58:33,628 - mmdet - INFO - Epoch [2][150/556] lr: 5.000e-03, eta: 0:25:01, time: 0.247, data_time: 0.003, memory: 3698, loss_rpn_cls: 0.1684, loss_rpn_bbox: 0.3741, loss_cls: 0.1782, acc: 95.1855, loss_bbox: 0.0871, loss: 0.8079, grad_norm: 4.0656 2021-12-25 12:58:46,130 - mmdet - INFO - Epoch [2][200/556] lr: 5.000e-03, eta: 0:24:48, time: 0.250, data_time: 0.004, memory: 3698, loss_rpn_cls: 0.1640, loss_rpn_bbox: 0.3339, loss_cls: 0.1264, acc: 96.4551, loss_bbox: 0.0689, loss: 0.6932, grad_norm: 3.8122 2021-12-25 12:58:58,419 - mmdet - INFO - Epoch [2][250/556] lr: 5.000e-03, eta: 0:24:33, time: 0.246, data_time: 0.003, memory: 3698, loss_rpn_cls: 0.1946, loss_rpn_bbox: 0.3867, loss_cls: 0.1531, acc: 96.0078, loss_bbox: 0.0735, loss: 0.8078, grad_norm: 4.4074 2021-12-25 12:59:10,842 - mmdet - INFO - Epoch [2][300/556] lr: 5.000e-03, eta: 0:24:20, time: 0.248, data_time: 0.003, memory: 3698, loss_rpn_cls: 0.1613, loss_rpn_bbox: 0.2321, loss_cls: 0.1315, acc: 96.3848, loss_bbox: 0.0628, loss: 0.5877, grad_norm: 3.5641 2021-12-25 12:59:23,110 - mmdet - INFO - Epoch [2][350/556] lr: 5.000e-03, eta: 0:24:05, time: 0.245, data_time: 0.003, memory: 3698, loss_rpn_cls: 0.1512, loss_rpn_bbox: 0.2770, loss_cls: 0.1250, acc: 96.8418, loss_bbox: 0.0636, loss: 0.6167, grad_norm: 3.5103 2021-12-25 12:59:34,907 - mmdet - INFO - Epoch [2][400/556] lr: 5.000e-03, eta: 0:23:48, time: 0.236, data_time: 0.004, memory: 3698, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_cls: nan, acc: 47.0749, loss_bbox: nan, loss: nan, grad_norm: nan 2021-12-25 12:59:46,370 - mmdet - INFO - Epoch [2][450/556] lr: 5.000e-03, eta: 0:23:30, time: 0.229, data_time: 0.004, memory: 3698, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_cls: nan, acc: 4.3909, loss_bbox: nan, loss: nan, grad_norm: nan 2021-12-25 12:59:57,932 - mmdet - INFO - Epoch [2][500/556] lr: 5.000e-03, eta: 0:23:13, time: 0.231, data_time: 0.004, memory: 3698, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_cls: nan, acc: 6.6430, loss_bbox: nan, loss: nan, grad_norm: nan 2021-12-25 13:00:09,327 - mmdet - INFO - Epoch [2][550/556] lr: 5.000e-03, eta: 0:22:55, time: 0.228, data_time: 0.004, memory: 3698, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_cls: nan, acc: 2.0405, loss_bbox: nan, loss: nan, grad_norm: nan 2021-12-25 13:00:10,642 - mmdet - INFO - Saving checkpoint at 2 epochs 2021-12-25 13:00:24,719 - mmdet - INFO - Epoch [3][50/556] lr: 5.000e-03, eta: 0:22:40, time: 0.271, data_time: 0.048, memory: 3698, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_cls: nan, acc: 2.3505, loss_bbox: nan, loss: nan, grad_norm: nan 2021-12-25 13:00:36,006 - mmdet - INFO - Epoch [3][100/556] lr: 5.000e-03, eta: 0:22:23, time: 0.226, data_time: 0.004, memory: 3698, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_cls: nan, acc: 4.9997, loss_bbox: nan, loss: nan, grad_norm: nan 2021-12-25 13:00:47,454 - mmdet - INFO - Epoch [3][150/556] lr: 5.000e-03, eta: 0:22:07, time: 0.229, data_time: 0.004, memory: 3698, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_cls: nan, acc: 4.8752, loss_bbox: nan, loss: nan, grad_norm: nan 2021-12-25 13:00:59,037 - mmdet - INFO - Epoch [3][200/556] lr: 5.000e-03, eta: 0:21:52, time: 0.232, data_time: 0.004, memory: 3698, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_cls: nan, acc: 4.3273, loss_bbox: nan, loss: nan, grad_norm: nan 2021-12-25 13:01:10,332 - mmdet - INFO - Epoch [3][250/556] lr: 5.000e-03, eta: 0:21:36, time: 0.226, data_time: 0.004, memory: 3698, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_cls: nan, acc: 4.1195, loss_bbox: nan, loss: nan, grad_norm: nan 2021-12-25 13:01:21,783 - mmdet - INFO - Epoch [3][300/556] lr: 5.000e-03, eta: 0:21:21, time: 0.229, data_time: 0.004, memory: 3698, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_cls: nan, acc: 4.1086, loss_bbox: nan, loss: nan, grad_norm: nan 2021-12-25 13:01:33,276 - mmdet - INFO - Epoch [3][350/556] lr: 5.000e-03, eta: 0:21:06, time: 0.230, data_time: 0.004, memory: 3698, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_cls: nan, acc: 3.9635, loss_bbox: nan, loss: nan, grad_norm: nan 2021-12-25 13:01:44,423 - mmdet - INFO - Epoch [3][400/556] lr: 5.000e-03, eta: 0:20:51, time: 0.223, data_time: 0.004, memory: 3698, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_cls: nan, acc: 4.1660, loss_bbox: nan, loss: nan, grad_norm: nan 2021-12-25 13:01:55,545 - mmdet - INFO - Epoch [3][450/556] lr: 5.000e-03, eta: 0:20:35, time: 0.222, data_time: 0.004, memory: 3698, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_cls: nan, acc: 4.0284, loss_bbox: nan, loss: nan, grad_norm: nan 2021-12-25 13:02:07,100 - mmdet - INFO - Epoch [3][500/556] lr: 5.000e-03, eta: 0:20:22, time: 0.231, data_time: 0.004, memory: 3698, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_cls: nan, acc: 3.2910, loss_bbox: nan, loss: nan, grad_norm: nan 2021-12-25 13:02:18,527 - mmdet - INFO - Epoch [3][550/556] lr: 5.000e-03, eta: 0:20:08, time: 0.229, data_time: 0.004, memory: 3698, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_cls: nan, acc: 4.2848, loss_bbox: nan, loss: nan, grad_norm: nan 2021-12-25 13:02:19,857 - mmdet - INFO - Saving checkpoint at 3 epochs 2021-12-25 13:02:33,744 - mmdet - INFO - Epoch [4][50/556] lr: 5.000e-03, eta: 0:19:54, time: 0.268, data_time: 0.048, memory: 3698, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_cls: nan, acc: 4.0692, loss_bbox: nan, loss: nan, grad_norm: nan 2021-12-25 13:02:45,107 - mmdet - INFO - Epoch [4][100/556] lr: 5.000e-03, eta: 0:19:40, time: 0.227, data_time: 0.004, memory: 3698, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_cls: nan, acc: 4.5256, loss_bbox: nan, loss: nan, grad_norm: nan 2021-12-25 13:02:56,641 - mmdet - INFO - Epoch [4][150/556] lr: 5.000e-03, eta: 0:19:26, time: 0.231, data_time: 0.004, memory: 3698, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_cls: nan, acc: 4.3268, loss_bbox: nan, loss: nan, grad_norm: nan 2021-12-25 13:03:08,337 - mmdet - INFO - Epoch [4][200/556] lr: 5.000e-03, eta: 0:19:14, time: 0.234, data_time: 0.004, memory: 3698, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_cls: nan, acc: 4.6684, loss_bbox: nan, loss: nan, grad_norm: nan 2021-12-25 13:03:19,619 - mmdet - INFO - Epoch [4][250/556] lr: 5.000e-03, eta: 0:19:00, time: 0.226, data_time: 0.004, memory: 3698, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_cls: nan, acc: 3.7860, loss_bbox: nan, loss: nan, grad_norm: nan 2021-12-25 13:03:30,928 - mmdet - INFO - Epoch [4][300/556] lr: 5.000e-03, eta: 0:18:46, time: 0.226, data_time: 0.003, memory: 3698, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_cls: nan, acc: 3.9249, loss_bbox: nan, loss: nan, grad_norm: nan 2021-12-25 13:03:42,402 - mmdet - INFO - Epoch [4][350/556] lr: 5.000e-03, eta: 0:18:33, time: 0.229, data_time: 0.004, memory: 3698, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_cls: nan, acc: 2.9420, loss_bbox: nan, loss: nan, grad_norm: nan 2021-12-25 13:03:53,884 - mmdet - INFO - Epoch [4][400/556] lr: 5.000e-03, eta: 0:18:20, time: 0.230, data_time: 0.004, memory: 3698, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_cls: nan, acc: 3.7416, loss_bbox: nan, loss: nan, grad_norm: nan 2021-12-25 13:04:05,434 - mmdet - INFO - Epoch [4][450/556] lr: 5.000e-03, eta: 0:18:07, time: 0.231, data_time: 0.004, memory: 3698, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_cls: nan, acc: 4.1148, loss_bbox: nan, loss: nan, grad_norm: nan 2021-12-25 13:04:16,813 - mmdet - INFO - Epoch [4][500/556] lr: 5.000e-03, eta: 0:17:54, time: 0.228, data_time: 0.004, memory: 3698, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_cls: nan, acc: 4.7071, loss_bbox: nan, loss: nan, grad_norm: nan 2021-12-25 13:04:28,306 - mmdet - INFO - Epoch [4][550/556] lr: 5.000e-03, eta: 0:17:41, time: 0.230, data_time: 0.004, memory: 3698, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_cls: nan, acc: 3.7080, loss_bbox: nan, loss: nan, grad_norm: nan

jbwang1997 commented 2 years ago

You can refer to mmdetection doc to find a solution of Nan loss.

MIXIAOXIN commented 2 years ago

thanks a lot. I fixed this nan error by filtering the small-size objects.

zhangleigood commented 2 years ago

请问您是怎么过滤小目标的?