shaunyuan22 / SODA-mmrotate

SODA-A Small Object Detection Toolbox and Benchmark
https://shaunyuan22.github.io/SODA/
Apache License 2.0
37 stars 6 forks source link

[Feature] Training slowing down in evalution phase #16

Open SimoneMatteo opened 3 months ago

SimoneMatteo commented 3 months ago

Hi all,

I'm trying to perform a training of the SODA-A dataset, only on the 'container' category, using this SODA-mmrotate repository and training on a single GPU (Tesla V100 16GB) with this command: python ./tools/train.py ./configs/sodaa-benchmarks/rotated_retinanet_obb_r50_fpn_1x.py .

The actual train step, the iterations in each epoch, seems quite fast while the subsequent evaluating phase is really slow:

2024-04-12 09:05:04,865 - mmrotate - INFO - workflow: [('train', 1)], max: 12 epochs
2024-04-12 09:05:04,865 - mmrotate - INFO - Checkpoints will be saved to /home/projects/SODA-mmrotate/work_dirs/rotated_retinanet_obb_r50_fpn_1x by HardDiskBackend.
2024-04-12 09:05:25,858 - mmrotate - INFO - Epoch [1][50/3912]  lr: 1.993e-03, eta: 5:28:05, time: 0.420, data_time: 0.053, memory: 3351, loss_cls: 1.6290, loss_bbox: 1.6736, loss: 3.3026, grad_norm: 4.4368
2024-04-12 09:05:41,096 - mmrotate - INFO - Epoch [1][100/3912] lr: 2.327e-03, eta: 4:42:50, time: 0.305, data_time: 0.005, memory: 3351, loss_cls: 1.2066, loss_bbox: 1.6734, loss: 2.8800, grad_norm: 4.0984
2024-04-12 09:05:56,126 - mmrotate - INFO - Epoch [1][150/3912] lr: 2.660e-03, eta: 4:26:29, time: 0.301, data_time: 0.005, memory: 3351, loss_cls: 1.1646, loss_bbox: 1.5418, loss: 2.7064, grad_norm: 12.5992
2024-04-12 09:06:11,323 - mmrotate - INFO - Epoch [1][200/3912] lr: 2.993e-03, eta: 4:18:51, time: 0.304, data_time: 0.005, memory: 3351, loss_cls: 1.0925, loss_bbox: 1.5081, loss: 2.6006, grad_norm: 5.7338
2024-04-12 09:06:26,505 - mmrotate - INFO - Epoch [1][250/3912] lr: 3.327e-03, eta: 4:14:07, time: 0.304, data_time: 0.005, memory: 3351, loss_cls: 9.4026, loss_bbox: 1.4593, loss: 10.8619, grad_norm: 832.6931
2024-04-12 09:06:41,811 - mmrotate - INFO - Epoch [1][300/3912] lr: 3.660e-03, eta: 4:11:12, time: 0.306, data_time: 0.005, memory: 3351, loss_cls: 1.0485, loss_bbox: 1.4459, loss: 2.4944, grad_norm: 3.6595
2024-04-12 09:06:57,063 - mmrotate - INFO - Epoch [1][350/3912] lr: 3.993e-03, eta: 4:08:55, time: 0.305, data_time: 0.005, memory: 3351, loss_cls: 1.0057, loss_bbox: 1.4853, loss: 2.4910, grad_norm: 3.6412
2024-04-12 09:07:12,195 - mmrotate - INFO - Epoch [1][400/3912] lr: 4.327e-03, eta: 4:06:55, time: 0.303, data_time: 0.005, memory: 3351, loss_cls: 1.0049, loss_bbox: 1.4787, loss: 2.4836, grad_norm: 7.1075
2024-04-12 09:07:27,334 - mmrotate - INFO - Epoch [1][450/3912] lr: 4.660e-03, eta: 4:05:19, time: 0.303, data_time: 0.005, memory: 3351, loss_cls: 0.9624, loss_bbox: 1.4962, loss: 2.4587, grad_norm: 3.1932
2024-04-12 09:07:42,438 - mmrotate - INFO - Epoch [1][500/3912] lr: 4.993e-03, eta: 4:03:56, time: 0.302, data_time: 0.005, memory: 3351, loss_cls: 1.2625, loss_bbox: 1.5821, loss: 2.8446, grad_norm: 8.3537
2024-04-12 09:07:57,782 - mmrotate - INFO - Epoch [1][550/3912] lr: 5.000e-03, eta: 4:03:05, time: 0.307, data_time: 0.005, memory: 3351, loss_cls: 1.0365, loss_bbox: 1.5705, loss: 2.6070, grad_norm: 5.4056
2024-04-12 09:08:13,510 - mmrotate - INFO - Epoch [1][600/3912] lr: 5.000e-03, eta: 4:02:50, time: 0.315, data_time: 0.005, memory: 3400, loss_cls: 0.9841, loss_bbox: 1.4768, loss: 2.4609, grad_norm: 3.5571
2024-04-12 09:08:28,871 - mmrotate - INFO - Epoch [1][650/3912] lr: 5.000e-03, eta: 4:02:09, time: 0.307, data_time: 0.005, memory: 3400, loss_cls: 0.9578, loss_bbox: 1.5634, loss: 2.5212, grad_norm: 3.0731
2024-04-12 09:08:44,151 - mmrotate - INFO - Epoch [1][700/3912] lr: 5.000e-03, eta: 4:01:26, time: 0.306, data_time: 0.005, memory: 3400, loss_cls: 0.9556, loss_bbox: 1.4635, loss: 2.4191, grad_norm: 2.7872
2024-04-12 09:08:59,436 - mmrotate - INFO - Epoch [1][750/3912] lr: 5.000e-03, eta: 4:00:47, time: 0.306, data_time: 0.005, memory: 3400, loss_cls: 0.9408, loss_bbox: 1.5794, loss: 2.5202, grad_norm: 5.4521
2024-04-12 09:09:14,561 - mmrotate - INFO - Epoch [1][800/3912] lr: 5.000e-03, eta: 4:00:02, time: 0.302, data_time: 0.005, memory: 3400, loss_cls: 0.9584, loss_bbox: 1.4378, loss: 2.3962, grad_norm: 2.6812
2024-04-12 09:09:29,790 - mmrotate - INFO - Epoch [1][850/3912] lr: 5.000e-03, eta: 3:59:25, time: 0.305, data_time: 0.005, memory: 3400, loss_cls: 0.9886, loss_bbox: 1.5372, loss: 2.5258, grad_norm: 5.0171
2024-04-12 09:09:44,873 - mmrotate - INFO - Epoch [1][900/3912] lr: 5.000e-03, eta: 3:58:44, time: 0.302, data_time: 0.005, memory: 3400, loss_cls: 0.9962, loss_bbox: 1.5532, loss: 2.5493, grad_norm: 3.9474
2024-04-12 09:10:00,063 - mmrotate - INFO - Epoch [1][950/3912] lr: 5.000e-03, eta: 3:58:11, time: 0.304, data_time: 0.005, memory: 3400, loss_cls: 0.9921, loss_bbox: 1.5238, loss: 2.5160, grad_norm: 3.0320
2024-04-12 09:10:15,356 - mmrotate - INFO - Exp name: rotated_retinanet_obb_r50_fpn_1x.py
2024-04-12 09:10:15,356 - mmrotate - INFO - Epoch [1][1000/3912]        lr: 5.000e-03, eta: 3:57:44, time: 0.306, data_time: 0.005, memory: 3400, loss_cls: 0.9568, loss_bbox: 1.5391, loss: 2.4959, grad_norm: 3.0069
2024-04-12 09:10:30,768 - mmrotate - INFO - Epoch [1][1050/3912]        lr: 5.000e-03, eta: 3:57:24, time: 0.308, data_time: 0.005, memory: 3400, loss_cls: 0.9282, loss_bbox: 1.5701, loss: 2.4983, grad_norm: 2.4356
2024-04-12 09:10:45,791 - mmrotate - INFO - Epoch [1][1100/3912]        lr: 5.000e-03, eta: 3:56:48, time: 0.300, data_time: 0.005, memory: 3400, loss_cls: 0.9457, loss_bbox: 1.5132, loss: 2.4589, grad_norm: 2.3597
2024-04-12 09:11:00,908 - mmrotate - INFO - Epoch [1][1150/3912]        lr: 5.000e-03, eta: 3:56:17, time: 0.302, data_time: 0.005, memory: 3400, loss_cls: 0.9972, loss_bbox: 1.5127, loss: 2.5099, grad_norm: 3.3273
2024-04-12 09:11:16,007 - mmrotate - INFO - Epoch [1][1200/3912]        lr: 5.000e-03, eta: 3:55:47, time: 0.302, data_time: 0.005, memory: 3400, loss_cls: 0.9748, loss_bbox: 1.5795, loss: 2.5543, grad_norm: 5.1063
2024-04-12 09:11:31,106 - mmrotate - INFO - Epoch [1][1250/3912]        lr: 5.000e-03, eta: 3:55:18, time: 0.302, data_time: 0.005, memory: 3400, loss_cls: 0.9365, loss_bbox: 1.6119, loss: 2.5484, grad_norm: 3.4181
2024-04-12 09:11:46,359 - mmrotate - INFO - Epoch [1][1300/3912]        lr: 5.000e-03, eta: 3:54:56, time: 0.305, data_time: 0.005, memory: 3400, loss_cls: 0.9847, loss_bbox: 1.4479, loss: 2.4326, grad_norm: 4.6342
2024-04-12 09:12:01,256 - mmrotate - INFO - Epoch [1][1350/3912]        lr: 5.000e-03, eta: 3:54:22, time: 0.298, data_time: 0.005, memory: 3400, loss_cls: 0.9126, loss_bbox: 1.5117, loss: 2.4243, grad_norm: 3.2224
2024-04-12 09:12:16,417 - mmrotate - INFO - Epoch [1][1400/3912]        lr: 5.000e-03, eta: 3:53:58, time: 0.303, data_time: 0.005, memory: 3400, loss_cls: 0.9535, loss_bbox: 1.4754, loss: 2.4290, grad_norm: 2.2386
2024-04-12 09:12:31,440 - mmrotate - INFO - Epoch [1][1450/3912]        lr: 5.000e-03, eta: 3:53:31, time: 0.300, data_time: 0.005, memory: 3400, loss_cls: 0.9151, loss_bbox: 1.3535, loss: 2.2685, grad_norm: 2.3255
2024-04-12 09:12:46,330 - mmrotate - INFO - Epoch [1][1500/3912]        lr: 5.000e-03, eta: 3:53:00, time: 0.298, data_time: 0.005, memory: 3400, loss_cls: 0.9363, loss_bbox: 1.4236, loss: 2.3600, grad_norm: 3.8528
2024-04-12 09:13:01,268 - mmrotate - INFO - Epoch [1][1550/3912]        lr: 5.000e-03, eta: 3:52:31, time: 0.299, data_time: 0.005, memory: 3400, loss_cls: 0.9298, loss_bbox: 1.5773, loss: 2.5071, grad_norm: 2.0563
2024-04-12 09:13:16,482 - mmrotate - INFO - Epoch [1][1600/3912]        lr: 5.000e-03, eta: 3:52:12, time: 0.304, data_time: 0.005, memory: 3400, loss_cls: 0.8870, loss_bbox: 1.2695, loss: 2.1565, grad_norm: 2.7449
2024-04-12 09:13:32,011 - mmrotate - INFO - Epoch [1][1650/3912]        lr: 5.000e-03, eta: 3:52:01, time: 0.311, data_time: 0.005, memory: 3400, loss_cls: 0.9187, loss_bbox: 1.3735, loss: 2.2922, grad_norm: 2.4286
2024-04-12 09:13:46,984 - mmrotate - INFO - Epoch [1][1700/3912]        lr: 5.000e-03, eta: 3:51:35, time: 0.299, data_time: 0.005, memory: 3400, loss_cls: 0.9866, loss_bbox: 1.5725, loss: 2.5591, grad_norm: 3.7872
2024-04-12 09:14:02,096 - mmrotate - INFO - Epoch [1][1750/3912]        lr: 5.000e-03, eta: 3:51:13, time: 0.302, data_time: 0.005, memory: 3400, loss_cls: 0.9379, loss_bbox: 1.5612, loss: 2.4991, grad_norm: 4.0667
2024-04-12 09:14:17,385 - mmrotate - INFO - Epoch [1][1800/3912]        lr: 5.000e-03, eta: 3:50:56, time: 0.306, data_time: 0.005, memory: 3400, loss_cls: 0.9717, loss_bbox: 1.4800, loss: 2.4516, grad_norm: 3.1458
2024-04-12 09:14:32,487 - mmrotate - INFO - Epoch [1][1850/3912]        lr: 5.000e-03, eta: 3:50:35, time: 0.302, data_time: 0.005, memory: 3417, loss_cls: 1.0662, loss_bbox: 1.4145, loss: 2.4806, grad_norm: 12.4195
2024-04-12 09:14:48,020 - mmrotate - INFO - Epoch [1][1900/3912]        lr: 5.000e-03, eta: 3:50:24, time: 0.311, data_time: 0.005, memory: 3437, loss_cls: 0.9194, loss_bbox: 1.4581, loss: 2.3775, grad_norm: 2.5146
2024-04-12 09:15:03,121 - mmrotate - INFO - Epoch [1][1950/3912]        lr: 5.000e-03, eta: 3:50:03, time: 0.302, data_time: 0.005, memory: 3437, loss_cls: 0.9095, loss_bbox: 1.4678, loss: 2.3774, grad_norm: 2.1679
2024-04-12 09:15:18,437 - mmrotate - INFO - Exp name: rotated_retinanet_obb_r50_fpn_1x.py
2024-04-12 09:15:18,437 - mmrotate - INFO - Epoch [1][2000/3912]        lr: 5.000e-03, eta: 3:49:47, time: 0.306, data_time: 0.005, memory: 3437, loss_cls: 0.8813, loss_bbox: 1.3487, loss: 2.2300, grad_norm: 2.2688
2024-04-12 09:15:33,623 - mmrotate - INFO - Epoch [1][2050/3912]        lr: 5.000e-03, eta: 3:49:29, time: 0.304, data_time: 0.005, memory: 3437, loss_cls: 0.9320, loss_bbox: 1.4427, loss: 2.3747, grad_norm: 2.3797
2024-04-12 09:15:48,993 - mmrotate - INFO - Epoch [1][2100/3912]        lr: 5.000e-03, eta: 3:49:14, time: 0.307, data_time: 0.005, memory: 3437, loss_cls: 0.8738, loss_bbox: 1.4034, loss: 2.2772, grad_norm: 2.0685
2024-04-12 09:16:04,432 - mmrotate - INFO - Epoch [1][2150/3912]        lr: 5.000e-03, eta: 3:49:01, time: 0.309, data_time: 0.005, memory: 3437, loss_cls: 0.9105, loss_bbox: 1.3818, loss: 2.2924, grad_norm: 2.0281
2024-04-12 09:16:19,576 - mmrotate - INFO - Epoch [1][2200/3912]        lr: 5.000e-03, eta: 3:48:42, time: 0.303, data_time: 0.005, memory: 3437, loss_cls: 0.8971, loss_bbox: 1.4368, loss: 2.3339, grad_norm: 2.6062
2024-04-12 09:16:34,805 - mmrotate - INFO - Epoch [1][2250/3912]        lr: 5.000e-03, eta: 3:48:24, time: 0.305, data_time: 0.005, memory: 3437, loss_cls: 0.8340, loss_bbox: 1.4508, loss: 2.2848, grad_norm: 3.5522
2024-04-12 09:16:50,040 - mmrotate - INFO - Epoch [1][2300/3912]        lr: 5.000e-03, eta: 3:48:07, time: 0.305, data_time: 0.005, memory: 3437, loss_cls: 0.9544, loss_bbox: 1.4373, loss: 2.3917, grad_norm: 2.2792
2024-04-12 09:17:05,247 - mmrotate - INFO - Epoch [1][2350/3912]        lr: 5.000e-03, eta: 3:47:49, time: 0.304, data_time: 0.005, memory: 3437, loss_cls: 0.8572, loss_bbox: 1.4107, loss: 2.2680, grad_norm: 2.6572
2024-04-12 09:17:20,097 - mmrotate - INFO - Epoch [1][2400/3912]        lr: 5.000e-03, eta: 3:47:25, time: 0.297, data_time: 0.005, memory: 3437, loss_cls: 0.8546, loss_bbox: 1.5040, loss: 2.3586, grad_norm: 2.8817
2024-04-12 09:17:35,556 - mmrotate - INFO - Epoch [1][2450/3912]        lr: 5.000e-03, eta: 3:47:12, time: 0.309, data_time: 0.006, memory: 3437, loss_cls: 0.8475, loss_bbox: 1.3900, loss: 2.2375, grad_norm: 3.1165
2024-04-12 09:17:50,692 - mmrotate - INFO - Epoch [1][2500/3912]        lr: 5.000e-03, eta: 3:46:54, time: 0.303, data_time: 0.005, memory: 3437, loss_cls: 0.8715, loss_bbox: 1.4124, loss: 2.2840, grad_norm: 4.3750
2024-04-12 09:18:05,553 - mmrotate - INFO - Epoch [1][2550/3912]        lr: 5.000e-03, eta: 3:46:31, time: 0.297, data_time: 0.005, memory: 3437, loss_cls: 0.9560, loss_bbox: 1.5002, loss: 2.4562, grad_norm: 2.6467
2024-04-12 09:18:20,732 - mmrotate - INFO - Epoch [1][2600/3912]        lr: 5.000e-03, eta: 3:46:13, time: 0.304, data_time: 0.005, memory: 3437, loss_cls: 0.8896, loss_bbox: 1.5083, loss: 2.3979, grad_norm: 2.6751
2024-04-12 09:18:35,948 - mmrotate - INFO - Epoch [1][2650/3912]        lr: 5.000e-03, eta: 3:45:56, time: 0.304, data_time: 0.005, memory: 3437, loss_cls: 0.8596, loss_bbox: 1.5203, loss: 2.3798, grad_norm: 3.0448
2024-04-12 09:18:50,752 - mmrotate - INFO - Epoch [1][2700/3912]        lr: 5.000e-03, eta: 3:45:33, time: 0.296, data_time: 0.005, memory: 3437, loss_cls: 0.8669, loss_bbox: 1.4264, loss: 2.2933, grad_norm: 2.4440
2024-04-12 09:19:05,923 - mmrotate - INFO - Epoch [1][2750/3912]        lr: 5.000e-03, eta: 3:45:15, time: 0.303, data_time: 0.005, memory: 3437, loss_cls: 0.8183, loss_bbox: 1.4203, loss: 2.2386, grad_norm: 3.7550
2024-04-12 09:19:21,298 - mmrotate - INFO - Epoch [1][2800/3912]        lr: 5.000e-03, eta: 3:45:01, time: 0.307, data_time: 0.005, memory: 3437, loss_cls: 0.7712, loss_bbox: 1.5588, loss: 2.3300, grad_norm: 2.9995
2024-04-12 09:19:36,535 - mmrotate - INFO - Epoch [1][2850/3912]        lr: 5.000e-03, eta: 3:44:45, time: 0.305, data_time: 0.005, memory: 3440, loss_cls: 0.8765, loss_bbox: 1.4688, loss: 2.3453, grad_norm: 3.0555
2024-04-12 09:19:51,680 - mmrotate - INFO - Epoch [1][2900/3912]        lr: 5.000e-03, eta: 3:44:28, time: 0.303, data_time: 0.005, memory: 3440, loss_cls: 0.7497, loss_bbox: 1.4229, loss: 2.1725, grad_norm: 3.3506
2024-04-12 09:20:06,839 - mmrotate - INFO - Epoch [1][2950/3912]        lr: 5.000e-03, eta: 3:44:11, time: 0.303, data_time: 0.005, memory: 3440, loss_cls: 0.7032, loss_bbox: 1.4441, loss: 2.1473, grad_norm: 3.0886
2024-04-12 09:20:21,987 - mmrotate - INFO - Exp name: rotated_retinanet_obb_r50_fpn_1x.py
2024-04-12 09:20:21,987 - mmrotate - INFO - Epoch [1][3000/3912]        lr: 5.000e-03, eta: 3:43:53, time: 0.303, data_time: 0.005, memory: 3440, loss_cls: 0.8062, loss_bbox: 1.5664, loss: 2.3726, grad_norm: 3.4617
2024-04-12 09:20:37,332 - mmrotate - INFO - Epoch [1][3050/3912]        lr: 5.000e-03, eta: 3:43:39, time: 0.307, data_time: 0.006, memory: 3440, loss_cls: 0.6792, loss_bbox: 1.4875, loss: 2.1667, grad_norm: 3.1049
2024-04-12 09:20:52,438 - mmrotate - INFO - Epoch [1][3100/3912]        lr: 5.000e-03, eta: 3:43:21, time: 0.302, data_time: 0.005, memory: 3440, loss_cls: 0.7403, loss_bbox: 1.3866, loss: 2.1269, grad_norm: 4.0551
2024-04-12 09:21:07,668 - mmrotate - INFO - Epoch [1][3150/3912]        lr: 5.000e-03, eta: 3:43:05, time: 0.305, data_time: 0.005, memory: 3453, loss_cls: 0.7157, loss_bbox: 1.3601, loss: 2.0757, grad_norm: 4.1472
2024-04-12 09:21:22,895 - mmrotate - INFO - Epoch [1][3200/3912]        lr: 5.000e-03, eta: 3:42:49, time: 0.305, data_time: 0.005, memory: 3453, loss_cls: 0.7520, loss_bbox: 1.6414, loss: 2.3935, grad_norm: 2.8356
2024-04-12 09:21:37,945 - mmrotate - INFO - Epoch [1][3250/3912]        lr: 5.000e-03, eta: 3:42:30, time: 0.301, data_time: 0.005, memory: 3453, loss_cls: 0.6694, loss_bbox: 1.5021, loss: 2.1715, grad_norm: 3.0969
2024-04-12 09:21:52,958 - mmrotate - INFO - Epoch [1][3300/3912]        lr: 5.000e-03, eta: 3:42:12, time: 0.300, data_time: 0.005, memory: 3453, loss_cls: 0.5920, loss_bbox: 1.4609, loss: 2.0529, grad_norm: 3.0660
2024-04-12 09:22:08,497 - mmrotate - INFO - Epoch [1][3350/3912]        lr: 5.000e-03, eta: 3:42:00, time: 0.311, data_time: 0.005, memory: 3453, loss_cls: 0.6954, loss_bbox: 1.4631, loss: 2.1585, grad_norm: 3.5781
2024-04-12 09:22:23,754 - mmrotate - INFO - Epoch [1][3400/3912]        lr: 5.000e-03, eta: 3:41:44, time: 0.305, data_time: 0.006, memory: 3453, loss_cls: 0.6607, loss_bbox: 1.5135, loss: 2.1742, grad_norm: 3.0280
2024-04-12 09:22:38,981 - mmrotate - INFO - Epoch [1][3450/3912]        lr: 5.000e-03, eta: 3:41:28, time: 0.305, data_time: 0.005, memory: 3453, loss_cls: 0.6304, loss_bbox: 1.5826, loss: 2.2130, grad_norm: 4.0896
2024-04-12 09:22:54,259 - mmrotate - INFO - Epoch [1][3500/3912]        lr: 5.000e-03, eta: 3:41:13, time: 0.306, data_time: 0.005, memory: 3453, loss_cls: 0.5883, loss_bbox: 1.5282, loss: 2.1165, grad_norm: 3.2138
2024-04-12 09:23:09,634 - mmrotate - INFO - Epoch [1][3550/3912]        lr: 5.000e-03, eta: 3:40:59, time: 0.307, data_time: 0.005, memory: 3453, loss_cls: 0.6425, loss_bbox: 1.5425, loss: 2.1850, grad_norm: 3.6814
2024-04-12 09:23:25,011 - mmrotate - INFO - Epoch [1][3600/3912]        lr: 5.000e-03, eta: 3:40:45, time: 0.308, data_time: 0.005, memory: 3453, loss_cls: 0.6135, loss_bbox: 1.4144, loss: 2.0279, grad_norm: 3.5741
2024-04-12 09:23:40,489 - mmrotate - INFO - Epoch [1][3650/3912]        lr: 5.000e-03, eta: 3:40:32, time: 0.310, data_time: 0.005, memory: 3453, loss_cls: 0.5975, loss_bbox: 1.4709, loss: 2.0684, grad_norm: 3.0558
2024-04-12 09:23:55,709 - mmrotate - INFO - Epoch [1][3700/3912]        lr: 5.000e-03, eta: 3:40:16, time: 0.304, data_time: 0.005, memory: 3453, loss_cls: 0.5526, loss_bbox: 1.4708, loss: 2.0234, grad_norm: 3.6822
2024-04-12 09:24:10,822 - mmrotate - INFO - Epoch [1][3750/3912]        lr: 5.000e-03, eta: 3:39:59, time: 0.302, data_time: 0.005, memory: 3453, loss_cls: 0.6590, loss_bbox: 1.4646, loss: 2.1236, grad_norm: 4.0632
2024-04-12 09:24:25,973 - mmrotate - INFO - Epoch [1][3800/3912]        lr: 5.000e-03, eta: 3:39:42, time: 0.303, data_time: 0.005, memory: 3453, loss_cls: 0.6982, loss_bbox: 1.4365, loss: 2.1347, grad_norm: 3.2053
2024-04-12 09:24:41,103 - mmrotate - INFO - Epoch [1][3850/3912]        lr: 5.000e-03, eta: 3:39:25, time: 0.303, data_time: 0.005, memory: 3453, loss_cls: 0.6323, loss_bbox: 1.4574, loss: 2.0897, grad_norm: 3.8847
2024-04-12 09:24:56,356 - mmrotate - INFO - Epoch [1][3900/3912]        lr: 5.000e-03, eta: 3:39:10, time: 0.305, data_time: 0.005, memory: 3453, loss_cls: 0.6638, loss_bbox: 1.5041, loss: 2.1679, grad_norm: 4.0185
2024-04-12 09:25:00,203 - mmrotate - INFO - Saving checkpoint at 1 epochs
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>                   ] 4066/6522, 8.0 task/s, elapsed: 510s, ETA:   308s
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 6522/6522, 8.0 task/s, elapsed: 819s, ETA:     0s
>>> Merge detected results of patch for whole image evaluating...
00257
02135
02031
02511
01593
00384
01795
01461
01404
00936
01502
02107
01401
01414
00184
01661
01926
01215
00530
02030
00733
00711
01909
00975
01268
02289
00231
01916
01784
00893
01250
00216
00123
01757
00997
00208
00051
00339
02083
00220
02455
01145
01924
01332
00566
02249
02179
00502
00960
00190
02090
01037
01065
00860
00198
01261
01603
00976
01305
00044
02186
00568
00812
00031
01492
01812
01542
01373
00266
01147
01939
01152
00855
01413
00498
00262
00996
02217
00256
02329
00152
02483
01575
02241
00776
01387
01190
00988
01517
02152
02227
01251
00825
02016
01608
02088
01211
00343
01995
00572
01693
02443
01580
00226
02077
00237
02002
00820
01978
01353
01004
00867
02022
02071
01329
01273
01509
00180
02424
02164
02504
02082
01600
01583
02205
00647
00063
01306
02513
01683
02482
02114
00048
00642
01932
02368
01902
00965
02116
02512
02011
01457
00905
00032
01430
02004
01160
02061
01210
02228
01002
00202
01374
02221
01416
00328
00793
00356
00819
01816
00599
00627
01587
01257
00964
00672
00704
00312
02413
01759
00205
01538
00614
02444
01568
00685
01885
01449
01386
02143
00331
02286
Merge results completed, it costs 488.6 seconds.
Evaluate annotation type *mAP*
Calculating IoUs...
IoU calculation Done (t=444.38s).
Running per image evaluation...

It is stuck on this evaluation step, and I see 2 cores fully working from htop, since hours and assuming it will be executed after each epoch, would require days just for evaluation...

Can you suggest something to perfom a faster evaluation step? Can I use a different evaluation procedure? How can I adjust the number of process used in this phase?

Or, given my goal of training the model only on the containers class to obtain higher AP, is there a better approach?

Sorry for using a 'Feature' tag even if it is not a proper feature proposal, but neither a bug.

Any suggestions are appreciated, Thanks

shaunyuan22 commented 2 months ago

Sorry for the late response. set nproc=10 works well for evaluation in our experiments. the evaluation spends ~3 hours since the densely packed situation, and we will update the codes for faster evaluation.