open-mmlab / mmrotate

OpenMMLab Rotated Object Detection Toolbox and Benchmark
https://mmrotate.readthedocs.io/en/latest/
Apache License 2.0
1.88k stars 558 forks source link

[Bug] The loss is nan. #1057

Open Calendula597 opened 3 months ago

Calendula597 commented 3 months ago

Prerequisite

Task

I'm using the official example scripts/configs for the officially supported tasks/models/datasets.

Branch

master branch https://github.com/open-mmlab/mmrotate

Environment

sys.platform: linux Python: 3.10.11 (main, May 16 2023, 00:28:57) [GCC 11.2.0] CUDA available: True MUSA available: False numpy_random_seed: 2147483648 GPU 0: NVIDIA RTX A6000 CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 11.7, V11.7.99 GCC: gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 PyTorch: 1.11.0+cu113 PyTorch compiling details: PyTorch built with:

TorchVision: 0.12.0+cu113 OpenCV: 4.9.0 MMEngine: 0.10.4 MMRotate: 1.0.0rc1+

Reproduces the problem - code sample

dataset settings

dataset_type = 'DIORDataset' data_root = '/work/data/datasets/DIOR/' backend_args = None

train_pipeline = [ dict(type='mmdet.LoadImageFromFile', backend_args=backend_args), dict(type='mmdet.LoadAnnotations', with_bbox=True, box_type='qbox'), dict(type='ConvertBoxType', box_type_mapping=dict(gt_bboxes='rbox')), dict(type='mmdet.Resize', scale=(800, 800), keep_ratio=True), dict( type='mmdet.RandomFlip', prob=0.75, direction=['horizontal', 'vertical', 'diagonal']), dict(type='mmdet.PackDetInputs') ] val_pipeline = [ dict(type='mmdet.LoadImageFromFile', backend_args=backend_args), dict(type='mmdet.Resize', scale=(800, 800), keep_ratio=True),

avoid bboxes being resized

dict(type='mmdet.LoadAnnotations', with_bbox=True, box_type='qbox'),
dict(type='ConvertBoxType', box_type_mapping=dict(gt_bboxes='rbox')),
dict(
    type='mmdet.PackDetInputs',
    meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
               'scale_factor'))

] test_pipeline = [ dict(type='mmdet.LoadImageFromFile', backend_args=backend_args), dict(type='mmdet.Resize', scale=(800, 800), keep_ratio=True), dict( type='mmdet.PackDetInputs', meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'scale_factor')) ] train_dataloader = dict( batch_size=2, num_workers=2, persistent_workers=True, sampler=dict(type='DefaultSampler', shuffle=True), batch_sampler=None, dataset=dict( type='ConcatDataset', ignore_keys=['DATASET_TYPE'], datasets=[ dict( type=dataset_type, data_root=data_root, ann_file='ImageSets/Main/train.txt', data_prefix=dict(img_path='JPEGImages-trainval'), filter_cfg=dict(filter_empty_gt=True), pipeline=train_pipeline), dict( type=dataset_type, data_root=data_root, ann_file='ImageSets/Main/val.txt', data_prefix=dict(img_path='JPEGImages-trainval'), filter_cfg=dict(filter_empty_gt=True), pipeline=train_pipeline, backend_args=backend_args) ])) val_dataloader = dict( batch_size=1, num_workers=2, persistent_workers=True, drop_last=False, sampler=dict(type='DefaultSampler', shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, ann_file='ImageSets/Main/test.txt', data_prefix=dict(img_path='JPEGImages-test'), test_mode=True, pipeline=val_pipeline, backend_args=backend_args)) test_dataloader = val_dataloader

val_evaluator = dict(type='DOTAMetric', metric='mAP') test_evaluator = val_evaluator

Reproduces the problem - command or script

dataset settings

dataset_type = 'DIORDataset' data_root = '/work/data/datasets/DIOR/' backend_args = None

train_pipeline = [ dict(type='mmdet.LoadImageFromFile', backend_args=backend_args), dict(type='mmdet.LoadAnnotations', with_bbox=True, box_type='qbox'), dict(type='ConvertBoxType', box_type_mapping=dict(gt_bboxes='rbox')), dict(type='mmdet.Resize', scale=(800, 800), keep_ratio=True), dict( type='mmdet.RandomFlip', prob=0.75, direction=['horizontal', 'vertical', 'diagonal']), dict(type='mmdet.PackDetInputs') ] val_pipeline = [ dict(type='mmdet.LoadImageFromFile', backend_args=backend_args), dict(type='mmdet.Resize', scale=(800, 800), keep_ratio=True),

avoid bboxes being resized

dict(type='mmdet.LoadAnnotations', with_bbox=True, box_type='qbox'),
dict(type='ConvertBoxType', box_type_mapping=dict(gt_bboxes='rbox')),
dict(
    type='mmdet.PackDetInputs',
    meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
               'scale_factor'))

] test_pipeline = [ dict(type='mmdet.LoadImageFromFile', backend_args=backend_args), dict(type='mmdet.Resize', scale=(800, 800), keep_ratio=True), dict( type='mmdet.PackDetInputs', meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'scale_factor')) ] train_dataloader = dict( batch_size=2, num_workers=2, persistent_workers=True, sampler=dict(type='DefaultSampler', shuffle=True), batch_sampler=None, dataset=dict( type='ConcatDataset', ignore_keys=['DATASET_TYPE'], datasets=[ dict( type=dataset_type, data_root=data_root, ann_file='ImageSets/Main/train.txt', data_prefix=dict(img_path='JPEGImages-trainval'), filter_cfg=dict(filter_empty_gt=True), pipeline=train_pipeline), dict( type=dataset_type, data_root=data_root, ann_file='ImageSets/Main/val.txt', data_prefix=dict(img_path='JPEGImages-trainval'), filter_cfg=dict(filter_empty_gt=True), pipeline=train_pipeline, backend_args=backend_args) ])) val_dataloader = dict( batch_size=1, num_workers=2, persistent_workers=True, drop_last=False, sampler=dict(type='DefaultSampler', shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, ann_file='ImageSets/Main/test.txt', data_prefix=dict(img_path='JPEGImages-test'), test_mode=True, pipeline=val_pipeline, backend_args=backend_args)) test_dataloader = val_dataloader

val_evaluator = dict(type='DOTAMetric', metric='mAP') test_evaluator = val_evaluator

Reproduces the problem - error message

I'm training oriented rcnn on dior dataset, but my loss of model is nan at epoch 7. The problem did not happen in DOTAv1.0 and DOTA1.5 datasets. How can I solve this problem?

Additional information

No response