Closed konyul closed 2 years ago
Please show the complete log for the first error. For the second one, the original model is trained on 16 GPUs and batch_size 1 for each GPU. So you may need to reduce the learning to 1/8 in your case.
Please show the complete log for the first error. For the second one, the original model is trained on 16 GPUs and batch_size 1 for each GPU. So you may need to reduce the learning to 1/8 in your case.
Thank you for your reply!! I will try it by reducing the lr to 1/8
And for the first error the full error log is
root@f5a3d94f3145:/mnt/sda/kypark/mmdetection3d# python3 tools/test.py configs/nuimages/cascade_mask_rcnn_r50_fpn_1x_nuim.py cascade_mask_rcnn_r50_fpn_1x_nuim_20201008_195342-1147c036.pth --eval segm
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
load checkpoint from local path: cascade_mask_rcnn_r50_fpn_1x_nuim_20201008_195342-1147c036.pth
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 50/50, 6.7 task/s, elapsed: 7s, ETA: 0sTraceback (most recent call last):
File "tools/test.py", line 238, in :
), ellipsis (...
), numpy.newaxis (None
) and integer or boolean arrays are valid indices
Thank you
Sorry for the late reply. Is this problem solved now? It's really strange because the error occurs inside mmdet, and the possible reason is an incorrect i
or 'counts'
index?
Please show the complete log for the first error. For the second one, the original model is trained on 16 GPUs and batch_size 1 for each GPU. So you may need to reduce the learning to 1/8 in your case.
Thank you for your reply!! I will try it by reducing the lr to 1/8
And for the first error the full error log is
root@f5a3d94f3145:/mnt/sda/kypark/mmdetection3d# python3 tools/test.py configs/nuimages/cascade_mask_rcnn_r50_fpn_1x_nuim.py cascade_mask_rcnn_r50_fpn_1x_nuim_20201008_195342-1147c036.pth --eval segm
loading annotations into memory... Done (t=0.00s) creating index... index created! load checkpoint from local path: cascade_mask_rcnn_r50_fpn_1x_nuim_20201008_195342-1147c036.pth [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 50/50, 6.7 task/s, elapsed: 7s, ETA: 0sTraceback (most recent call last): File "tools/test.py", line 238, in main() File "tools/test.py", line 234, in main print(dataset.evaluate(outputs, **eval_kwargs)) File "/opt/conda/lib/python3.8/site-packages/mmdet/datasets/coco.py", line 438, in evaluate result_files, tmp_dir = self.format_results(results, jsonfile_prefix) File "/opt/conda/lib/python3.8/site-packages/mmdet/datasets/coco.py", line 383, in format_results result_files = self.results2json(results, jsonfile_prefix) File "/opt/conda/lib/python3.8/site-packages/mmdet/datasets/coco.py", line 320, in results2json json_results = self._segm2json(results) File "/opt/conda/lib/python3.8/site-packages/mmdet/datasets/coco.py", line 288, in _segm2json if isinstance(segms[i]['counts'], bytes): IndexError: only integers, slices (
:
), ellipsis (...
), numpy.newaxis (None
) and integer or boolean arrays are valid indicesThank you
Can you tell me how your nuimages data set was prepared, I would like to surface some paper results on it, but the data set is huge, I don't know how to prepare it and double track the paper results
when I train htc with command python tools/test.py configs/nuimages/cascade_mask_rcnn_r50_fpn_1x_nuim.py cascade_mask_rcnn_r50_fpn_1x_nuim_20201008_195342-1147c036.pth --eval segm, error occurs like Exception has occurred: IndexError (note: full exception trace is shown but execution is paused at: _run_module_as_main) only integers, slices (
main()
File "[/opt/conda/lib/python3.8/runpy.py]()", line 87, in _run_code
exec(code, run_globals)
File "[/opt/conda/lib/python3.8/runpy.py]()", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "[/opt/conda/lib/python3.8/runpy.py]()", line 265, in run_path
return _run_module_code(code, init_globals, run_name,
File "[/opt/conda/lib/python3.8/runpy.py]()", line 87, in _run_code
exec(code, run_globals)
File "[/opt/conda/lib/python3.8/runpy.py]()", line 194, in _run_module_as_main (Current frame)
return _run_code(code, main_globals, None,
:
), ellipsis (...
), numpy.newaxis (None
) and integer or boolean arrays are valid indices File "[/opt/conda/lib/python3.8/site-packages/mmdet/datasets/coco.py]()", line 288, in _segm2json if isinstance(segms[i]['counts'], bytes): File "[/opt/conda/lib/python3.8/site-packages/mmdet/datasets/coco.py]()", line 320, in results2json json_results = self._segm2json(results) File "[/opt/conda/lib/python3.8/site-packages/mmdet/datasets/coco.py]()", line 383, in format_results result_files = self.results2json(results, jsonfile_prefix) File "[/opt/conda/lib/python3.8/site-packages/mmdet/datasets/coco.py]()", line 438, in evaluate result_files, tmp_dir = self.format_results(results, jsonfile_prefix) File "[/mnt/sda/kypark/mmdetection3d/tools/test.py]()", line 234, in main print(dataset.evaluate(outputs, **eval_kwargs)) File "[/mnt/sda/kypark/mmdetection3d/tools/test.py]()", line 238, inwhen I train htc with following command : CUDA_VISIBLE_DEVICES=0,1 tools/dist_train.sh configs/nuimages/htc_x101_64x4d_fpn_dconv_c3-c5_coco-20e_16x1_20e_nuim.py 2, the training loss becomes nan after few iterations. 2022-02-19 12:52:48,269 - mmdet - INFO - Epoch [1][50/30105] lr: 1.978e-03, eta: 10 days, 5:31:04, time: 1.468, data_time: 0.097, memory: 12798, loss_rpn_cls: 0.0068, loss_rpn_bbox: 0.0114, loss_semantic_seg: 0.5636, s0.loss_cls: 0.1252, s0.acc: 95.0645, s0.loss_bbox: 0.0670, s0.loss_mask: 0.2330, s1.loss_cls: 0.0577, s1.acc: 95.4884, s1.loss_bbox: 0.1056, s1.loss_mask: 0.1148, s2.loss_cls: 0.0297, s2.acc: 95.2546, s2.loss_bbox: 0.0784, s2.loss_mask: 0.0555, loss: 1.4487 2022-02-19 12:53:59,565 - mmdet - INFO - Epoch [1][100/30105] lr: 3.976e-03, eta: 10 days, 1:58:29, time: 1.426, data_time: 0.032, memory: 13125, loss_rpn_cls: 0.0060, loss_rpn_bbox: 0.0132, loss_semantic_seg: 0.0275, s0.loss_cls: 0.1275, s0.acc: 94.9297, s0.loss_bbox: 0.0736, s0.loss_mask: 0.2282, s1.loss_cls: 0.0604, s1.acc: 95.3703, s1.loss_bbox: 0.1142, s1.loss_mask: 0.1121, s2.loss_cls: 0.0315, s2.acc: 95.0123, s2.loss_bbox: 0.0871, s2.loss_mask: 0.0552, loss: 0.9365 2022-02-19 12:55:11,869 - mmdet - INFO - Epoch [1][150/30105] lr: 5.974e-03, eta: 10 days, 1:54:03, time: 1.446, data_time: 0.027, memory: 13125, loss_rpn_cls: 0.0109, loss_rpn_bbox: 0.0140, loss_semantic_seg: 0.0277, s0.loss_cls: 0.1552, s0.acc: 94.1270, s0.loss_bbox: 0.0862, s0.loss_mask: 0.2366, s1.loss_cls: 0.0741, s1.acc: 94.4801, s1.loss_bbox: 0.1260, s1.loss_mask: 0.1171, s2.loss_cls: 0.0376, s2.acc: 94.2086, s2.loss_bbox: 0.0870, s2.loss_mask: 0.0571, loss: 1.0296 2022-02-19 12:56:22,025 - mmdet - INFO - Epoch [1][200/30105] lr: 7.972e-03, eta: 10 days, 0:03:31, time: 1.403, data_time: 0.044, memory: 13272, loss_rpn_cls: 0.0211, loss_rpn_bbox: 0.0258, loss_semantic_seg: 0.0369, s0.loss_cls: 0.1864, s0.acc: 92.9004, s0.loss_bbox: 0.1127, s0.loss_mask: 0.2577, s1.loss_cls: 0.0904, s1.acc: 93.2562, s1.loss_bbox: 0.1449, s1.loss_mask: 0.1242, s2.loss_cls: 0.0454, s2.acc: 92.8702, s2.loss_bbox: 0.0924, s2.loss_mask: 0.0600, loss: 1.1978 2022-02-19 12:57:30,625 - mmdet - INFO - Epoch [1][250/30105] lr: 9.970e-03, eta: 9 days, 21:54:11, time: 1.372, data_time: 0.035, memory: 13272, loss_rpn_cls: 0.0269, loss_rpn_bbox: 0.0291, loss_semantic_seg: 0.0324, s0.loss_cls: 0.2031, s0.acc: 92.5410, s0.loss_bbox: 0.1045, s0.loss_mask: 0.2751, s1.loss_cls: 0.0991, s1.acc: 92.7682, s1.loss_bbox: 0.1308, s1.loss_mask: 0.1356, s2.loss_cls: 0.0490, s2.acc: 92.5318, s2.loss_bbox: 0.0882, s2.loss_mask: 0.0676, loss: 1.2413 2022-02-19 12:58:40,022 - mmdet - INFO - Epoch [1][300/30105] lr: 1.197e-02, eta: 9 days, 20:54:20, time: 1.388, data_time: 0.032, memory: 13272, loss_rpn_cls: 0.0291, loss_rpn_bbox: 0.0266, loss_semantic_seg: 0.0356, s0.loss_cls: 0.2330, s0.acc: 92.1406, s0.loss_bbox: 0.1194, s0.loss_mask: 0.3224, s1.loss_cls: 0.1156, s1.acc: 91.9972, s1.loss_bbox: 0.1478, s1.loss_mask: 0.1520, s2.loss_cls: 0.0563, s2.acc: 91.8879, s2.loss_bbox: 0.0879, s2.loss_mask: 0.0738, loss: 1.3997 2022-02-19 12:59:50,268 - mmdet - INFO - Epoch [1][350/30105] lr: 1.397e-02, eta: 9 days, 20:35:39, time: 1.405, data_time: 0.041, memory: 13272, loss_rpn_cls: 0.0434, loss_rpn_bbox: 0.0288, loss_semantic_seg: 0.0487, s0.loss_cls: 0.2206, s0.acc: 92.7305, s0.loss_bbox: 0.1241, s0.loss_mask: 0.3407, s1.loss_cls: 0.1056, s1.acc: 93.0553, s1.loss_bbox: 0.1470, s1.loss_mask: 0.1621, s2.loss_cls: 0.0488, s2.acc: 93.3164, s2.loss_bbox: 0.0788, s2.loss_mask: 0.0766, loss: 1.4252 2022-02-19 13:00:56,281 - mmdet - INFO - Epoch [1][400/30105] lr: 1.596e-02, eta: 9 days, 18:35:09, time: 1.320, data_time: 0.030, memory: 13272, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_semantic_seg: nan, s0.loss_cls: nan, s0.acc: 79.7120, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 80.0003, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 79.9697, s2.loss_bbox: nan, s2.loss_mask: nan, loss: nan 2022-02-19 13:01:56,643 - mmdet - INFO - Epoch [1][450/30105] lr: 1.796e-02, eta: 9 days, 14:55:16, time: 1.207, data_time: 0.039, memory: 13272, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_semantic_seg: nan, s0.loss_cls: nan, s0.acc: 38.5134, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 38.5134, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 38.5134, s2.loss_bbox: nan, s2.loss_mask: nan, loss: nan 2022-02-19 13:02:53,735 - mmdet - INFO - Epoch [1][500/30105] lr: 1.996e-02, eta: 9 days, 10:53:31, time: 1.142, data_time: 0.042, memory: 13272, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_semantic_seg: nan, s0.loss_cls: nan, s0.acc: 35.8741, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 35.8741, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 35.8741, s2.loss_bbox: nan, s2.loss_mask: nan, loss: nan 2022-02-19 13:03:53,128 - mmdet - INFO - Epoch [1][550/30105] lr: 2.000e-02, eta: 9 days, 8:17:33, time: 1.188, data_time: 0.038, memory: 13272, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_semantic_seg: nan, s0.loss_cls: nan, s0.acc: 43.2959, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 43.2959, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 43.2959, s2.loss_bbox: nan, s2.loss_mask: nan, loss: nan 2022-02-19 13:04:51,275 - mmdet - INFO - Epoch [1][600/30105] lr: 2.000e-02, eta: 9 days, 5:46:36, time: 1.163, data_time: 0.031, memory: 13272, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_semantic_seg: nan, s0.loss_cls: nan, s0.acc: 41.6359, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 41.6359, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 41.6359, s2.loss_bbox: nan, s2.loss_mask: nan, loss: nan 2022-02-19 13:05:46,677 - mmdet - INFO - Epoch [1][650/30105] lr: 2.000e-02, eta: 9 days, 2:56:23, time: 1.108, data_time: 0.037, memory: 13272, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_semantic_seg: nan, s0.loss_cls: nan, s0.acc: 35.3376, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 35.3376, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 35.3376, s2.loss_bbox: nan, s2.loss_mask: nan, loss: nan 2022-02-19 13:06:44,278 - mmdet - INFO - Epoch [1][700/30105] lr: 2.000e-02, eta: 9 days, 1:01:53, time: 1.152, data_time: 0.037, memory: 13272, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_semantic_seg: nan, s0.loss_cls: nan, s0.acc: 42.5351, s0.loss_bbox: nan, s0.loss_mask: nan, s1.loss_cls: nan, s1.acc: 42.5351, s1.loss_bbox: nan, s1.loss_mask: nan, s2.loss_cls: nan, s2.acc: 42.5351, s2.loss_bbox: nan, s2.loss_mask: nan, loss: nan