Closed xiaoyihit closed 2 years ago
Did you successfully reimplement the base config: r3det_kfiou_ln_r50_fpn_1x_dota_oc.py
?
Yeah, although I only get mAP 72.06
There is a typo in r3det_kfiou_ln_swin_tiny_adamw_fpn_1x_dota_ms_rr_oc.py
, use angle_version = 'oc'
instead of angle_version = 'le90'
Well, r3det_kfiou_ln_swin_tiny_adamw_fpn_2x_dota_ms_rr_oc is based on r3det_kfiou_ln_swin_tiny_adamw_fpn_1x_dota_ms_rr_oc, so I dont see any difference? Also, is it using angle_version = 'le90' instead of angle_version = 'oc' brings me such problems?
I did tried that btw, 2022-03-25 00:57:20,760 - mmrotate - INFO - Exp name: r3det_kfiou_ln_swin_tiny_adamw_fpn_2x_dota_ms_rr_oc.py 2022-03-25 00:57:20,761 - mmrotate - INFO - Epoch [3][2200/6400] lr: 1.000e-04, eta: 2 days, 2:44:45, time: 1.375, data_time: 0.147, memory: 6889, s0.loss_cls: 0.8992, s0.loss_bbox: 46.6267, sr0.loss_cls: 0.5464, sr0.loss_bbox: 6.5478, loss: 54.6200, grad_norm: 254.9667 2022-03-25 00:58:31,609 - mmrotate - INFO - Epoch [3][2250/6400] lr: 1.000e-04, eta: 2 days, 2:44:25, time: 1.417, data_time: 0.152, memory: 6889, s0.loss_cls: 0.8991, s0.loss_bbox: 52.8903, sr0.loss_cls: 0.5891, sr0.loss_bbox: 6.5397, loss: 60.9181, grad_norm: 234.1258 2022-03-25 00:59:41,344 - mmrotate - INFO - Epoch [3][2300/6400] lr: 1.000e-04, eta: 2 days, 2:43:54, time: 1.395, data_time: 0.153, memory: 6889, s0.loss_cls: 0.9001, s0.loss_bbox: 43.9201, sr0.loss_cls: 0.6084, sr0.loss_bbox: 9.1293, loss: 54.5579, grad_norm: 225.4924 2022-03-25 01:00:51,674 - mmrotate - INFO - Epoch [3][2350/6400] lr: 1.000e-04, eta: 2 days, 2:43:28, time: 1.407, data_time: 0.170, memory: 6889, s0.loss_cls: 0.8645, s0.loss_bbox: 40.6765, sr0.loss_cls: 0.4212, sr0.loss_bbox: 8.9599, loss: 50.9220, grad_norm: 237.7153 2022-03-25 01:02:00,876 - mmrotate - INFO - Epoch [3][2400/6400] lr: 1.000e-04, eta: 2 days, 2:42:52, time: 1.384, data_time: 0.149, memory: 6889, s0.loss_cls: 0.9447, s0.loss_bbox: 48.2193, sr0.loss_cls: 0.6202, sr0.loss_bbox: 8.7886, loss: 58.5729, grad_norm: 241.3439 2022-03-25 01:03:11,688 - mmrotate - INFO - Epoch [3][2450/6400] lr: 1.000e-04, eta: 2 days, 2:42:30, time: 1.416, data_time: 0.161, memory: 6889, s0.loss_cls: 0.9231, s0.loss_bbox: 54.3726, sr0.loss_cls: 0.6410, sr0.loss_bbox: 6.9682, loss: 62.9048, grad_norm: 244.5289 2022-03-25 01:04:19,729 - mmrotate - INFO - Epoch [3][2500/6400] lr: 1.000e-04, eta: 2 days, 2:41:43, time: 1.361, data_time: 0.147, memory: 6889, s0.loss_cls: 0.8697, s0.loss_bbox: 39.4403, sr0.loss_cls: 0.5404, sr0.loss_bbox: 6.2415, loss: 47.0919, grad_norm: 154.8938 2022-03-25 01:05:30,525 - mmrotate - INFO - Epoch [3][2550/6400] lr: 1.000e-04, eta: 2 days, 2:41:20, time: 1.416, data_time: 0.150, memory: 6889, s0.loss_cls: 0.9822, s0.loss_bbox: 56.3863, sr0.loss_cls: 0.5780, sr0.loss_bbox: 11.5071, loss: 69.4536, grad_norm: 236.7918 2022-03-25 01:06:40,029 - mmrotate - INFO - Epoch [3][2600/6400] lr: 1.000e-04, eta: 2 days, 2:40:46, time: 1.390, data_time: 0.162, memory: 6889, s0.loss_cls: 0.8897, s0.loss_bbox: 47.3726, sr0.loss_cls: 0.6092, sr0.loss_bbox: 7.9623, loss: 56.8338, grad_norm: 228.0067 2022-03-25 01:07:49,040 - mmrotate - INFO - Epoch [3][2650/6400] lr: 1.000e-04, eta: 2 days, 2:40:06, time: 1.380, data_time: 0.147, memory: 6889, s0.loss_cls: 0.9107, s0.loss_bbox: 55.2884, sr0.loss_cls: 0.5196, sr0.loss_bbox: 6.7891, loss: 63.5078, grad_norm: 265.5788 2022-03-25 01:08:56,614 - mmrotate - INFO - Epoch [3][2700/6400] lr: 1.000e-04, eta: 2 days, 2:39:14, time: 1.351, data_time: 0.159, memory: 6889, s0.loss_cls: nan, s0.loss_bbox: nan, sr0.loss_cls: nan, sr0.loss_bbox: nan, loss: nan, grad_norm: nan 2022-03-25 01:10:02,129 - mmrotate - INFO - Epoch [3][2750/6400] lr: 1.000e-04, eta: 2 days, 2:38:04, time: 1.310, data_time: 0.156, memory: 6889, s0.loss_cls: nan, s0.loss_bbox: nan, sr0.loss_cls: nan, sr0.loss_bbox: nan, loss: nan, grad_norm: nan 2022-03-25 01:11:09,186 - mmrotate - INFO - Epoch [3][2800/6400] lr: 1.000e-04, eta: 2 days, 2:37:07, time: 1.341, data_time: 0.156, memory: 6889, s0.loss_cls: nan, s0.loss_bbox: nan, sr0.loss_cls: nan, sr0.loss_bbox: nan, loss: nan, grad_norm: nan 2022-03-25 01:12:15,525 - mmrotate - INFO - Epoch [3][2850/6400] lr: 1.000e-04, eta: 2 days, 2:36:04, time: 1.327, data_time: 0.160, memory: 6889, s0.loss_cls: nan, s0.loss_bbox: nan, sr0.loss_cls: nan, sr0.loss_bbox: nan, loss: nan, grad_norm: nan 2022-03-25 01:13:19,685 - mmrotate - INFO - Epoch [3][2900/6400] lr: 1.000e-04, eta: 2 days, 2:34:41, time: 1.283, data_time: 0.154, memory: 6889, s0.loss_cls: nan, s0.loss_bbox: nan, sr0.loss_cls: nan, sr0.loss_bbox: nan, loss: nan, grad_norm: nan 2022-03-25 01:14:26,097 - mmrotate - INFO - Epoch [3][2950/6400] lr: 1.000e-04, eta: 2 days, 2:33:39, time: 1.328, data_time: 0.166, memory: 6889, s0.loss_cls: nan, s0.loss_bbox: nan, sr0.loss_cls: nan, sr0.loss_bbox: nan, loss: nan, grad_norm: nan 2022-03-25 01:15:32,833 - mmrotate - INFO - Epoch [3][3000/6400] lr: 1.000e-04, eta: 2 days, 2:32:39, time: 1.335, data_time: 0.162, memory: 6889, s0.loss_cls: nan, s0.loss_bbox: nan, sr0.loss_cls: nan, sr0.loss_bbox: nan, loss: nan, grad_norm: nan 2022-03-25 01:16:39,540 - mmrotate - INFO - Epoch [3][3050/6400] lr: 1.000e-04, eta: 2 days, 2:31:39, time: 1.334, data_time: 0.157, memory: 6889, s0.loss_cls: nan, s0.loss_bbox: nan, sr0.loss_cls: nan, sr0.loss_bbox: nan, loss: nan, grad_norm: nan 2022-03-25 01:17:43,395 - mmrotate - INFO - Epoch [3][3100/6400] lr: 1.000e-04, eta: 2 days, 2:30:14, time: 1.277, data_time: 0.142, memory: 6889, s0.loss_cls: nan, s0.loss_bbox: nan, sr0.loss_cls: nan, sr0.loss_bbox: nan, loss: nan, grad_norm: nan
r3det+le
will cause NAN
What about roi_trans_kfiou_ln_swin_tiny_fpn_1x_dota_le90.py?
Is it NAN too?
nah, it is map 0.
I will do some experiments about roi_trans_kfiou_ln_swin_tiny_fpn_1x_dota_le90
, but need some time.
We also need your feedback, especially about r3det_kfiou_ln_swin_tiny_adamw_fpn_1x_dota_ms_rr_oc
.
I trained roi_trans_r50_fpn_1x_dota_le90.py
and roi_trans_kfiou_ln_r50_fpn_1x_dota_le90.py
was successful, but roi_trans_kfiou_ln_swin_tiny_fpn_1x_dota_le90.py
failed with normal training log. I'll keep debugging.
change into
neck=dict(
_delete_=True,
type='FPN',
in_channels=[96, 192, 384, 768],
out_channels=256,
add_extra_convs='on_input',
num_outs=5)
@xiaoyihit
We find that the release version of mmrotate's kfiou+roi trans
doesn't seem to be any better than the roi trans
. Since the experiments in the paper are run on the code before the release version, and the release version has undergone many code refactorings, further parameter adjustment for kfiou+roi trans
may be required.
As you can see, the release version of mmrotate needs to integrate many methods, so there may be bad configuration files. However, we have released the weights and logs for the correct configuration files, and will add more models in the future.
change into
neck=dict( _delete_=True, type='FPN', in_channels=[96, 192, 384, 768], out_channels=256, add_extra_convs='on_input', num_outs=5)
@xiaoyihit
it will cause cuda out of memary
add data = dict(samples_per_gpu=1, workers_per_gpu=1)
refer to
https://github.com/open-mmlab/mmrotate/blob/df125d7121d9d4074d1ccfdec60fd65ce58ff7d8/configs/roi_trans/roi_trans_swin_tiny_fpn_1x_dota_le90.py#L31
We find that the release version of mmrotate's
kfiou+roi trans
doesn't seem to be any better than theroi trans
. Since the experiments in the paper are run on the code before the release version, and the release version has undergone many code refactorings, further parameter adjustment forkfiou+roi trans
may be required. As you can see, the release version of mmrotate needs to integrate many methods, so there may be bad configuration files. However, we have released the weights and logs for the correct configuration files, and will add more models in the future.
Just finished eval. This is your result for task 1:
mAP: 0.750360501827728 ap of each class: plane:0.894929653799662, baseball-diamond:0.7731615899566354, bridge:0.5157623225260385, ground-track-field:0.734778658651271, small-vehicle:0.7860644079039538, large-vehicle:0.8159744191901619, ship:0.8772641293948107, tennis-court:0.9089554260940254, basketball-court:0.8580533136030519, storage-tank:0.8515154228628408, soccer-ball-field:0.6271493722082745, roundabout:0.6387006969394813, harbor:0.6737986751942772, swimming-pool:0.7237979777030293, helicopter:0.5755014613884054 The submitted information is :
Description: r3det_kfiou_ln_swin_tiny_adamw_fpn_1x_dota_ms_rr_oc\Task1_results
Single GPU
This is your result for task 1:
mAP: 0.7639999397376762 ap of each class: plane:0.8920492711808281, baseball-diamond:0.8311689917311053, bridge:0.5487773220207339, ground-track-field:0.715685100739603, small-vehicle:0.7891244562305876, large-vehicle:0.8295689284111806, ship:0.8804525056013857, tennis-court:0.9089562289562291, basketball-court:0.8732825143127781, storage-tank:0.8590706510255619, soccer-ball-field:0.6396620876719492, roundabout:0.6472101230008205, harbor:0.7628661551618027, swimming-pool:0.7013281937144408, helicopter:0.5807965663061383 The submitted information is :
Description: roi_trans_r50_fpn_1x_dota_le90
This is your result for task 1:
mAP: 0.7577335010077175 ap of each class: plane:0.8894649588480816, baseball-diamond:0.8290480970945256, bridge:0.5548931517545318, ground-track-field:0.7173129226267527, small-vehicle:0.7898956105813991, large-vehicle:0.8167135005240188, ship:0.8793419879806488, tennis-court:0.9090909090909093, basketball-court:0.8708165080114506, storage-tank:0.8577658662597724, soccer-ball-field:0.6529545308196423, roundabout:0.6157234981467674, harbor:0.752155680103153, swimming-pool:0.7138769381846977, helicopter:0.5169483550894158 The submitted information is :
Description: roi_trans_kfiou_ln_r50_fpn_1x_dota_le90
if you use multu-gpu, lr need to be modified, lr=lr*gpu_num.
if you use multu-gpu, lr need to be modified, lr=lr*gpu_num.
I am using single gpu.
Single GPU
This is your result for task 1:
mAP: 0.7639999397376762 ap of each class: plane:0.8920492711808281, baseball-diamond:0.8311689917311053, bridge:0.5487773220207339, ground-track-field:0.715685100739603, small-vehicle:0.7891244562305876, large-vehicle:0.8295689284111806, ship:0.8804525056013857, tennis-court:0.9089562289562291, basketball-court:0.8732825143127781, storage-tank:0.8590706510255619, soccer-ball-field:0.6396620876719492, roundabout:0.6472101230008205, harbor:0.7628661551618027, swimming-pool:0.7013281937144408, helicopter:0.5807965663061383 The submitted information is :
Description: roi_trans_r50_fpn_1x_dota_le90
This is your result for task 1:
mAP: 0.7577335010077175 ap of each class: plane:0.8894649588480816, baseball-diamond:0.8290480970945256, bridge:0.5548931517545318, ground-track-field:0.7173129226267527, small-vehicle:0.7898956105813991, large-vehicle:0.8167135005240188, ship:0.8793419879806488, tennis-court:0.9090909090909093, basketball-court:0.8708165080114506, storage-tank:0.8577658662597724, soccer-ball-field:0.6529545308196423, roundabout:0.6157234981467674, harbor:0.752155680103153, swimming-pool:0.7138769381846977, helicopter:0.5169483550894158 The submitted information is :
Description: roi_trans_kfiou_ln_r50_fpn_1x_dota_le90
similar results. configs/kfiou/roi_trans_kfiou_ln_r50_fpn_1x_dota_le90.py 0.7538 configs/kfiou/roi_trans_kfiou_ln_r50_fpn_1x_dota_ms_rr_le90.py 0.7660
Gaussian based methods (e.g. gwd, kld ang kfiou) are sensitive to loss weight. Tuning may be required for kfiou in two-stage mehod.
PS: roi_trans_kfiou_ln_r50_fpn_1x_dota_ms_rr_le90
is used to train in ms_trainval
set and test in ms_test
set. ms_trainval
set and ms_test
set are splited by
Gaussian based methods (e.g. gwd, kld ang kfiou) are sensitive to loss weight. Tuning may be required for kfiou in two-stage mehod.
PS:
roi_trans_kfiou_ln_r50_fpn_1x_dota_ms_rr_le90
is used to train inms_trainval
set and test inms_test
set.ms_trainval
set andms_test
set are splited by
Thx for pointing out the mistake. Seems that I forgot the multi-scale config, I will check that out soon.
change into
neck=dict( _delete_=True, type='FPN', in_channels=[96, 192, 384, 768], out_channels=256, add_extra_convs='on_input', num_outs=5)
@xiaoyihit
There seems to be a bug,
Traceback (most recent call last):
File "tools/train.py", line 252, in <module>
main()
File "tools/train.py", line 246, in main
meta=meta)
File "/remote-home/xiaoyi/mmrotate-main/mmrotate/apis/train.py", line 156, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/opt/conda/envs/mmrotatev2/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File "/opt/conda/envs/mmrotatev2/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
self.run_iter(data_batch, train_mode=True, **kwargs)
File "/opt/conda/envs/mmrotatev2/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
**kwargs)
File "/opt/conda/envs/mmrotatev2/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 75, in train_step
return self.module.train_step(*inputs[0], **kwargs[0])
File "/opt/conda/envs/mmrotatev2/lib/python3.7/site-packages/mmdet/models/detectors/base.py", line 248, in train_step
losses = self(**data)
File "/opt/conda/envs/mmrotatev2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/envs/mmrotatev2/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 109, in new_func
return old_func(*args, **kwargs)
File "/opt/conda/envs/mmrotatev2/lib/python3.7/site-packages/mmdet/models/detectors/base.py", line 172, in forward
return self.forward_train(img, img_metas, **kwargs)
File "/remote-home/xiaoyi/mmrotate-main/mmrotate/models/detectors/two_stage.py", line 150, in forward_train
**kwargs)
File "/remote-home/xiaoyi/mmrotate-main/mmrotate/models/roi_heads/roi_trans_roi_head.py", line 238, in forward_train
rcnn_train_cfg)
File "/remote-home/xiaoyi/mmrotate-main/mmrotate/models/roi_heads/roi_trans_roi_head.py", line 155, in _bbox_forward_train
bbox_results = self._bbox_forward(stage, x, rois)
File "/remote-home/xiaoyi/mmrotate-main/mmrotate/models/roi_heads/roi_trans_roi_head.py", line 126, in _bbox_forward
rois)
File "/opt/conda/envs/mmrotatev2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/envs/mmrotatev2/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 197, in new_func
return old_func(*args, **kwargs)
File "/remote-home/xiaoyi/mmrotate-main/mmrotate/models/roi_heads/roi_extractors/rotate_single_level_roi_extractor.py", line 133, in forward
roi_feats_t = self.roi_layers[i](feats[i], rois_)
File "/opt/conda/envs/mmrotatev2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/envs/mmrotatev2/lib/python3.7/site-packages/mmcv/ops/roi_align_rotated.py", line 171, in forward
self.clockwise)
File "/opt/conda/envs/mmrotatev2/lib/python3.7/site-packages/mmcv/ops/roi_align_rotated.py", line 70, in forward
clockwise=ctx.clockwise)
RuntimeError: CUDA error: an illegal memory access was encountered
terminate called after throwing an instance of 'c10::CUDAError'
what(): CUDA error: an illegal memory access was encountered
Exception raised from create_event_internal at /opt/conda/conda-bld/pytorch_1634272178570/work/c10/cuda/CUDACachingAllocator.cpp:1211 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f0606f6bd62 in /opt/conda/envs/mmrotatev2/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x1c613 (0x7f065e85a613 in /opt/conda/envs/mmrotatev2/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x1a2 (0x7f065e85b022 in /opt/conda/envs/mmrotatev2/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10::TensorImpl::release_resources() + 0xa4 (0x7f0606f55314 in /opt/conda/envs/mmrotatev2/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #4: <unknown function> + 0x294dd9 (0x7f06dc8d2dd9 in /opt/conda/envs/mmrotatev2/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #5: <unknown function> + 0xae2f59 (0x7f06dd120f59 in /opt/conda/envs/mmrotatev2/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #6: THPVariable_subclass_dealloc(_object*) + 0x2b9 (0x7f06dd121279 in /opt/conda/envs/mmrotatev2/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>
frame #24: __libc_start_main + 0xe7 (0x7f0717e39bf7 in /lib/x86_64-linux-gnu/libc.so.6)
Aborted (core dumped)
My config:
dataset_type = 'DOTADataset'
data_root = '/remote-home/xiaoyi/datasets/dotav1/split_1024_dota1_0/'
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True),
dict(type='RResize', img_scale=(1024, 1024)),
dict(
type='RRandomFlip',
flip_ratio=[0.25, 0.25, 0.25],
direction=['horizontal', 'vertical', 'diagonal'],
version='le90'),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1024, 1024),
flip=False,
transforms=[
dict(type='RResize'),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img'])
])
]
data = dict(
samples_per_gpu=2,
workers_per_gpu=2,
train=dict(
type='DOTADataset',
ann_file=
'/remote-home/xiaoyi/datasets/dotav1/split_1024_dota1_0/trainval/annfiles/',
img_prefix=
'/remote-home/xiaoyi/datasets/dotav1/split_1024_dota1_0/trainval/images/',
pipeline=[
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True),
dict(type='RResize', img_scale=(1024, 1024)),
dict(
type='RRandomFlip',
flip_ratio=[0.25, 0.25, 0.25],
direction=['horizontal', 'vertical', 'diagonal'],
version='le90'),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
],
version='le90'),
val=dict(
type='DOTADataset',
ann_file=
'/remote-home/xiaoyi/datasets/dotav1/split_1024_dota1_0/trainval/annfiles/',
img_prefix=
'/remote-home/xiaoyi/datasets/dotav1/split_1024_dota1_0/trainval/images/',
pipeline=[
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1024, 1024),
flip=False,
transforms=[
dict(type='RResize'),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img'])
])
],
version='le90'),
test=dict(
type='DOTADataset',
ann_file=
'/remote-home/xiaoyi/datasets/dotav1/split_1024_dota1_0/test/images/',
img_prefix=
'/remote-home/xiaoyi/datasets/dotav1/split_1024_dota1_0/test/images/',
pipeline=[
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1024, 1024),
flip=False,
transforms=[
dict(type='RResize'),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img'])
])
],
version='le90'))
evaluation = dict(interval=12, metric='mAP')
optimizer = dict(
type='AdamW',
lr=0.0001,
betas=(0.9, 0.999),
weight_decay=0.05,
paramwise_cfg=dict(
custom_keys=dict(
absolute_pos_embed=dict(decay_mult=0.0),
relative_position_bias_table=dict(decay_mult=0.0),
norm=dict(decay_mult=0.0))))
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=0.3333333333333333,
step=[8, 11])
runner = dict(type='EpochBasedRunner', max_epochs=12)
checkpoint_config = dict(interval=12)
log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')])
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
angle_version = 'le90'
model = dict(
type='RoITransformer',
backbone=dict(
type='SwinTransformer',
embed_dims=96,
depths=[2, 2, 6, 2],
num_heads=[3, 6, 12, 24],
window_size=7,
mlp_ratio=4,
qkv_bias=True,
qk_scale=None,
drop_rate=0.0,
attn_drop_rate=0.0,
drop_path_rate=0.2,
patch_norm=True,
out_indices=(0, 1, 2, 3),
with_cp=False,
convert_weights=True,
init_cfg=dict(
type='Pretrained',
checkpoint=
'https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_tiny_patch4_window7_224.pth'
)),
neck=dict(
type='FPN',
in_channels=[96, 192, 384, 768],
out_channels=256,
add_extra_convs='on_input',
num_outs=5),
rpn_head=dict(
type='RotatedRPNHead',
in_channels=256,
feat_channels=256,
version='le90',
anchor_generator=dict(
type='AnchorGenerator',
scales=[8],
ratios=[0.5, 1.0, 2.0],
strides=[4, 8, 16, 32, 64]),
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[0.0, 0.0, 0.0, 0.0],
target_stds=[1.0, 1.0, 1.0, 1.0]),
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
loss_bbox=dict(
type='SmoothL1Loss', beta=0.1111111111111111, loss_weight=1.0)),
roi_head=dict(
type='RoITransRoIHead',
version='le90',
num_stages=2,
stage_loss_weights=[1, 1],
bbox_roi_extractor=[
dict(
type='SingleRoIExtractor',
roi_layer=dict(
type='RoIAlign', output_size=7, sampling_ratio=0),
out_channels=256,
featmap_strides=[4, 8, 16, 32]),
dict(
type='RotatedSingleRoIExtractor',
roi_layer=dict(
type='RoIAlignRotated',
out_size=7,
sample_num=2,
clockwise=True),
out_channels=256,
featmap_strides=[4, 8, 16, 32])
],
bbox_head=[
dict(
type='RotatedShared2FCBBoxHead',
in_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=15,
bbox_coder=dict(
type='DeltaXYWHAHBBoxCoder',
angle_range='le90',
norm_factor=2,
edge_swap=True,
target_means=[0.0, 0.0, 0.0, 0.0, 0.0],
target_stds=[0.1, 0.1, 0.2, 0.2, 1]),
reg_class_agnostic=True,
loss_cls=dict(
type='CrossEntropyLoss',
use_sigmoid=False,
loss_weight=1.0),
loss_bbox=dict(
type='SmoothL1Loss',
beta=0.1111111111111111,
loss_weight=1.0)),
dict(
type='RotatedKFIoUShared2FCBBoxHead',
in_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=15,
bbox_coder=dict(
type='DeltaXYWHAOBBoxCoder',
angle_range='le90',
norm_factor=None,
edge_swap=True,
proj_xy=True,
target_means=[0.0, 0.0, 0.0, 0.0, 0.0],
target_stds=[0.05, 0.05, 0.1, 0.1, 0.5]),
reg_class_agnostic=False,
loss_cls=dict(
type='CrossEntropyLoss',
use_sigmoid=False,
loss_weight=1.0),
loss_bbox=dict(type='KFLoss', fun='ln', loss_weight=0.5))
]),
train_cfg=dict(
rpn=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.7,
neg_iou_thr=0.3,
min_pos_iou=0.3,
match_low_quality=True,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=256,
pos_fraction=0.5,
neg_pos_ub=-1,
add_gt_as_proposals=False),
allowed_border=0,
pos_weight=-1,
debug=False),
rpn_proposal=dict(
nms_pre=2000,
max_per_img=2000,
nms=dict(type='nms', iou_threshold=0.7),
min_bbox_size=0),
rcnn=[
dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.5,
neg_iou_thr=0.5,
min_pos_iou=0.5,
match_low_quality=False,
ignore_iof_thr=-1,
iou_calculator=dict(type='BboxOverlaps2D')),
sampler=dict(
type='RandomSampler',
num=512,
pos_fraction=0.25,
neg_pos_ub=-1,
add_gt_as_proposals=True),
pos_weight=-1,
debug=False),
dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.5,
neg_iou_thr=0.5,
min_pos_iou=0.5,
match_low_quality=False,
ignore_iof_thr=-1,
iou_calculator=dict(type='RBboxOverlaps2D')),
sampler=dict(
type='RRandomSampler',
num=512,
pos_fraction=0.25,
neg_pos_ub=-1,
add_gt_as_proposals=True),
pos_weight=-1,
debug=False)
]),
test_cfg=dict(
rpn=dict(
nms_pre=2000,
max_per_img=2000,
nms=dict(type='nms', iou_threshold=0.7),
min_bbox_size=0),
rcnn=dict(
nms_pre=2000,
min_bbox_size=0,
score_thr=0.05,
nms=dict(type='le90', iou_thr=0.1),
max_per_img=2000)))
pretrained = 'https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_tiny_patch4_window7_224.pth'
find_unused_parameters = True
work_dir = './work_dirs/roi_trans_kfiou_ln_swin_tiny_fpn_1x_dota_le90'
auto_resume = False
gpu_ids = [1]
I can run normally, on a 2080ti gpu. roi_trans_kfiou_ln_swin_tiny_fpn_1x_dota_le90.txt
mAP: 0.7907676910846436 ap of each class: plane:0.8951765569961131, baseball-diamond:0.8326362977155665, bridge:0.5660402122647823, ground-track-field:0.7565563051287862, small-vehicle:0.8090605281971763, large-vehicle:0.8372130002571467, ship:0.8844067743199389, tennis-court:0.9088127411811229, basketball-court:0.8536264881156366, storage-tank:0.8752888005548948, soccer-ball-field:0.6697054274046957, roundabout:0.7047203486881632, harbor:0.775892803409076, swimming-pool:0.7752507482116147, helicopter:0.7171283338249412 COCO style result: AP50: 0.7907676910846436 AP75: 0.4811109336025689 mAP: 0.47054638715884184 The submitted information is : Description: r3det_kfiou_ln_swin_tiny_adamw_fpn_2x_dota_ms_rr_oc\Task1_results Still way too far away from 80.90%
This is mine:
20220424_185126.log.json.txt Partial log.
Reimplement a model in the model zoo using the provided configs
Checklist
Describe the issue
Reimplement a model in the model zoo using the provided configs configs/kfiou/r3det_kfiou_ln_swin_tiny_adamw_fpn_1x_dota_ms_rr_oc.py configs/kfiou/roi_trans_kfiou_ln_swin_tiny_fpn_1x_dota_le90.py
Reproduction
tools/train.py
configs/kfiou/r3det_kfiou_ln_swin_tiny_adamw_fpn_1x_dota_ms_rr_oc.py configs/kfiou/roi_trans_kfiou_ln_swin_tiny_fpn_1x_dota_le90.py
I loaded swintransformer model myself since the site given is invalid.
dotav1
Environment
python mmrotate/utils/collect_env.py
to collect necessary environment information and paste it here.sys.platform: linux Python: 3.7.11 (default, Jul 27 2021, 14:32:16) [GCC 7.5.0] CUDA available: True GPU 0,1: GeForce RTX 3090 CUDA_HOME: /usr/local/cuda NVCC: Build cuda_11.1.TC455_06.29190527_0 GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 PyTorch: 1.10.0 PyTorch compiling details: PyTorch built with:
TorchVision: 0.11.1 OpenCV: 4.5.4 MMCV: 1.4.8 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 11.1 MMRotate: 0.1.0+
Results
The result is strange for both configs. for r3det_kfiou_ln_swin_tiny_adamw_fpn_1x_dota_ms_rr_oc, the loss goes nan with the given config in the middle of training process. The output is as follow: 2022-03-03 03:13:07,337 - mmrotate - INFO - Exp name: r3det_kfiou_ln_swin_tiny_adamw_fpn_1x_dota_ms_rr_oc.py 2022-03-03 03:13:07,337 - mmrotate - INFO - Epoch [4][3800/6400] lr: 1.000e-04, eta: 15:13:00, time: 1.222, data_time: 0.009, memory: 6728, s0.loss_cls: 0.8579, s0.loss_bbox: 49.3017, sr0.loss_cls: 0.5324, sr0.loss_bbox: 12.8906, loss: 63.5826, grad_norm: 238.8943 2022-03-03 03:14:09,169 - mmrotate - INFO - Epoch [4][3850/6400] lr: 1.000e-04, eta: 15:12:34, time: 1.237, data_time: 0.009, memory: 6728, s0.loss_cls: 0.9198, s0.loss_bbox: 45.2929, sr0.loss_cls: 0.5530, sr0.loss_bbox: 24.8593, loss: 71.6250, grad_norm: 303.1804 2022-03-03 03:15:09,412 - mmrotate - INFO - Epoch [4][3900/6400] lr: 1.000e-04, eta: 15:12:05, time: 1.205, data_time: 0.009, memory: 6728, s0.loss_cls: 0.9237, s0.loss_bbox: 38.0998, sr0.loss_cls: 0.5913, sr0.loss_bbox: 22.2580, loss: 61.8729, grad_norm: 258.2369 2022-03-03 03:16:09,798 - mmrotate - INFO - Epoch [4][3950/6400] lr: 1.000e-04, eta: 15:11:36, time: 1.208, data_time: 0.009, memory: 6728, s0.loss_cls: 0.8610, s0.loss_bbox: 51.9997, sr0.loss_cls: 0.4727, sr0.loss_bbox: 8.2045, loss: 61.5379, grad_norm: 245.4183 2022-03-03 03:17:10,993 - mmrotate - INFO - Epoch [4][4000/6400] lr: 1.000e-04, eta: 15:11:09, time: 1.224, data_time: 0.010, memory: 6728, s0.loss_cls: 0.9141, s0.loss_bbox: 43.9352, sr0.loss_cls: 0.4976, sr0.loss_bbox: 17.3902, loss: 62.7370, grad_norm: 251.7698 2022-03-03 03:18:10,932 - mmrotate - INFO - Epoch [4][4050/6400] lr: 1.000e-04, eta: 15:10:38, time: 1.199, data_time: 0.010, memory: 6728, s0.loss_cls: 0.8810, s0.loss_bbox: 46.5427, sr0.loss_cls: 0.5061, sr0.loss_bbox: 17.8371, loss: 65.7669, grad_norm: 223.0147 2022-03-03 03:19:10,435 - mmrotate - INFO - Epoch [4][4100/6400] lr: 1.000e-04, eta: 15:10:07, time: 1.190, data_time: 0.009, memory: 6728, s0.loss_cls: 0.9800, s0.loss_bbox: 38.0009, sr0.loss_cls: 0.4385, sr0.loss_bbox: 13.2763, loss: 52.6957, grad_norm: 182.5411 2022-03-03 03:20:11,712 - mmrotate - INFO - Epoch [4][4150/6400] lr: 1.000e-04, eta: 15:09:39, time: 1.225, data_time: 0.010, memory: 6728, s0.loss_cls: 0.8986, s0.loss_bbox: 45.8436, sr0.loss_cls: 0.4950, sr0.loss_bbox: 14.5906, loss: 61.8278, grad_norm: 248.6563 2022-03-03 03:21:10,936 - mmrotate - INFO - Epoch [4][4200/6400] lr: 1.000e-04, eta: 15:09:07, time: 1.184, data_time: 0.009, memory: 6728, s0.loss_cls: 0.8558, s0.loss_bbox: 44.0947, sr0.loss_cls: 0.5000, sr0.loss_bbox: 17.9017, loss: 63.3523, grad_norm: 211.6122 2022-03-03 03:22:10,696 - mmrotate - INFO - Epoch [4][4250/6400] lr: 1.000e-04, eta: 15:08:35, time: 1.195, data_time: 0.010, memory: 6728, s0.loss_cls: 0.8331, s0.loss_bbox: 39.5684, sr0.loss_cls: 0.6009, sr0.loss_bbox: 19.1346, loss: 60.1370, grad_norm: 207.3286 2022-03-03 03:23:11,753 - mmrotate - INFO - Epoch [4][4300/6400] lr: 1.000e-04, eta: 15:08:07, time: 1.221, data_time: 0.010, memory: 6728, s0.loss_cls: 0.8634, s0.loss_bbox: 42.2088, sr0.loss_cls: 0.6487, sr0.loss_bbox: 30.8503, loss: 74.5712, grad_norm: 228.5747 2022-03-03 03:24:08,064 - mmrotate - INFO - Epoch [4][4350/6400] lr: 1.000e-04, eta: 15:07:28, time: 1.126, data_time: 0.009, memory: 6728, s0.loss_cls: nan, s0.loss_bbox: nan, sr0.loss_cls: nan, sr0.loss_bbox: nan, loss: nan, grad_norm: nan 2022-03-03 03:25:01,703 - mmrotate - INFO - Epoch [4][4400/6400] lr: 1.000e-04, eta: 15:06:42, time: 1.073, data_time: 0.009, memory: 6728, s0.loss_cls: nan, s0.loss_bbox: nan, sr0.loss_cls: nan, sr0.loss_bbox: nan, loss: nan, grad_norm: nan
For roi_trans_kfiou_ln_swin_tiny_fpn_1x_dota_le90, things are strange again. The loss seems normal, however the mAP is close to zero.
2022-03-03 10:36:05,468 - mmrotate - INFO - Epoch [12][5950/6400] lr: 1.000e-06, eta: 0:03:56, time: 0.307, data_time: 0.007, memory: 7279, loss_rpn_cls: 0.0862, loss_rpn_bbox: 0.0175, s0.loss_cls: 0.1440, s0.acc: 95.4902, s0.loss_bbox: 0.1616, s1.loss_cls: 0.1116, s1.acc: 96.6328, s1.loss_bbox: 0.0469, loss: 0.5677, grad_norm: 2.8243 2022-03-03 10:36:20,704 - mmrotate - INFO - Epoch [12][6000/6400] lr: 1.000e-06, eta: 0:03:29, time: 0.305, data_time: 0.007, memory: 7279, loss_rpn_cls: 0.0784, loss_rpn_bbox: 0.0115, s0.loss_cls: 0.1379, s0.acc: 95.9961, s0.loss_bbox: 0.1559, s1.loss_cls: 0.1037, s1.acc: 97.1230, s1.loss_bbox: 0.0312, loss: 0.5187, grad_norm: 2.3542 2022-03-03 10:36:35,877 - mmrotate - INFO - Epoch [12][6050/6400] lr: 1.000e-06, eta: 0:03:03, time: 0.303, data_time: 0.007, memory: 7279, loss_rpn_cls: 0.1008, loss_rpn_bbox: 0.0221, s0.loss_cls: 0.1560, s0.acc: 94.7656, s0.loss_bbox: 0.1852, s1.loss_cls: 0.1271, s1.acc: 95.7930, s1.loss_bbox: 0.0425, loss: 0.6336, grad_norm: 2.9375 2022-03-03 10:36:51,185 - mmrotate - INFO - Epoch [12][6100/6400] lr: 1.000e-06, eta: 0:02:37, time: 0.306, data_time: 0.007, memory: 7279, loss_rpn_cls: 0.1049, loss_rpn_bbox: 0.0182, s0.loss_cls: 0.1789, s0.acc: 94.1738, s0.loss_bbox: 0.2147, s1.loss_cls: 0.1394, s1.acc: 95.6074, s1.loss_bbox: 0.0549, loss: 0.7109, grad_norm: 3.0274 2022-03-03 10:37:06,526 - mmrotate - INFO - Epoch [12][6150/6400] lr: 1.000e-06, eta: 0:02:11, time: 0.307, data_time: 0.007, memory: 7279, loss_rpn_cls: 0.1238, loss_rpn_bbox: 0.0199, s0.loss_cls: 0.1759, s0.acc: 94.2949, s0.loss_bbox: 0.1875, s1.loss_cls: 0.1410, s1.acc: 95.5137, s1.loss_bbox: 0.0447, loss: 0.6929, grad_norm: 3.0797 2022-03-03 10:37:21,769 - mmrotate - INFO - Epoch [12][6200/6400] lr: 1.000e-06, eta: 0:01:44, time: 0.305, data_time: 0.007, memory: 7279, loss_rpn_cls: 0.0888, loss_rpn_bbox: 0.0145, s0.loss_cls: 0.1383, s0.acc: 95.7617, s0.loss_bbox: 0.1513, s1.loss_cls: 0.1136, s1.acc: 96.6421, s1.loss_bbox: 0.0477, loss: 0.5542, grad_norm: 2.6510 2022-03-03 10:37:37,021 - mmrotate - INFO - Epoch [12][6250/6400] lr: 1.000e-06, eta: 0:01:18, time: 0.305, data_time: 0.007, memory: 7279, loss_rpn_cls: 0.1119, loss_rpn_bbox: 0.0170, s0.loss_cls: 0.1698, s0.acc: 95.0332, s0.loss_bbox: 0.1738, s1.loss_cls: 0.1317, s1.acc: 96.1934, s1.loss_bbox: 0.0454, loss: 0.6497, grad_norm: 2.8886 2022-03-03 10:37:52,335 - mmrotate - INFO - Epoch [12][6300/6400] lr: 1.000e-06, eta: 0:00:52, time: 0.306, data_time: 0.007, memory: 7279, loss_rpn_cls: 0.1384, loss_rpn_bbox: 0.0296, s0.loss_cls: 0.1954, s0.acc: 93.4336, s0.loss_bbox: 0.2464, s1.loss_cls: 0.1495, s1.acc: 95.3066, s1.loss_bbox: 0.0577, loss: 0.8170, grad_norm: 3.6254 2022-03-03 10:38:07,793 - mmrotate - INFO - Epoch [12][6350/6400] lr: 1.000e-06, eta: 0:00:26, time: 0.309, data_time: 0.007, memory: 7279, loss_rpn_cls: 0.1259, loss_rpn_bbox: 0.0295, s0.loss_cls: 0.1714, s0.acc: 94.6934, s0.loss_bbox: 0.1900, s1.loss_cls: 0.1361, s1.acc: 96.0684, s1.loss_bbox: 0.0520, loss: 0.7049, grad_norm: 3.2369 2022-03-03 10:38:23,037 - mmrotate - INFO - Exp name: roi_trans_kfiou_ln_swin_tiny_fpn_1x_dota_le90.py 2022-03-03 10:38:23,037 - mmrotate - INFO - Epoch [12][6400/6400] lr: 1.000e-06, eta: 0:00:00, time: 0.305, data_time: 0.007, memory: 7279, loss_rpn_cls: 0.1173, loss_rpn_bbox: 0.0252, s0.loss_cls: 0.1731, s0.acc: 94.5508, s0.loss_bbox: 0.1745, s1.loss_cls: 0.1519, s1.acc: 95.3516, s1.loss_bbox: 0.0641, loss: 0.7061, grad_norm: 3.0836 2022-03-03 10:38:23,234 - mmrotate - INFO - Saving checkpoint at 12 epochs 2022-03-03 10:52:21,340 - mmrotate - INFO - +--------------------+-------+-------+--------+-------+ | class | gts | dets | recall | ap | +--------------------+-------+-------+--------+-------+ | plane | 18788 | 38005 | 0.062 | 0.013 | | baseball-diamond | 1087 | 401 | 0.016 | 0.002 | | bridge | 4181 | 504 | 0.000 | 0.000 | | ground-track-field | 733 | 374 | 0.023 | 0.003 | | small-vehicle | 58868 | 69520 | 0.014 | 0.000 | | large-vehicle | 43075 | 65885 | 0.010 | 0.000 | | ship | 76153 | 73019 | 0.013 | 0.000 | | tennis-court | 5923 | 10750 | 0.075 | 0.007 | | basketball-court | 1180 | 751 | 0.024 | 0.002 | | storage-tank | 13670 | 19423 | 0.017 | 0.001 | | soccer-ball-field | 827 | 736 | 0.018 | 0.001 | | roundabout | 973 | 190 | 0.006 | 0.000 | | harbor | 15468 | 28043 | 0.010 | 0.000 | | swimming-pool | 3836 | 4222 | 0.008 | 0.000 | | helicopter | 1189 | 502 | 0.024 | 0.005 | +--------------------+-------+-------+--------+-------+ | mAP | | | | 0.002 | +--------------------+-------+-------+--------+-------+ 2022-03-03 10:52:21,404 - mmrotate - INFO - Exp name: roi_trans_kfiou_ln_swin_tiny_fpn_1x_dota_le90.py 2022-03-03 10:52:21,405 - mmrotate - INFO - Epoch(val) [12][12800] mAP: 0.0024