Prerequisite

[X] I have searched Issues and Discussions but cannot get the expected help.
[X] I have read the FAQ documentation but cannot get the expected help.
[X] The bug has not been fixed in the latest version (master) or latest version (1.x).

Task

I'm using the official example scripts/configs for the officially supported tasks/models/datasets.

Branch

master branch https://github.com/open-mmlab/mmrotate

Environment

sys.platform: linux Python: 3.10.11 (main, May 16 2023, 00:28:57) [GCC 11.2.0] CUDA available: True MUSA available: False numpy_random_seed: 2147483648 GPU 0: NVIDIA RTX A6000 CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 11.7, V11.7.99 GCC: gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 PyTorch: 1.11.0+cu113 PyTorch compiling details: PyTorch built with:

GCC 7.3
C++ Version: 201402
Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v2.5.2 (Git Hash a9302535553c73243c632ad3c4c80beec3d19a1e)
OpenMP 201511 (a.k.a. OpenMP 4.5)
LAPACK is enabled (usually provided by MKL)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 11.3
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
CuDNN 8.2
Magma 2.5.2
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,

TorchVision: 0.12.0+cu113 OpenCV: 4.9.0 MMEngine: 0.10.4 MMRotate: 1.0.0rc1+

Reproduces the problem - code sample

dataset settings

dataset_type = 'DIORDataset' data_root = '/work/data/datasets/DIOR/' backend_args = None

train_pipeline = [ dict(type='mmdet.LoadImageFromFile', backend_args=backend_args), dict(type='mmdet.LoadAnnotations', with_bbox=True, box_type='qbox'), dict(type='ConvertBoxType', box_type_mapping=dict(gt_bboxes='rbox')), dict(type='mmdet.Resize', scale=(800, 800), keep_ratio=True), dict( type='mmdet.RandomFlip', prob=0.75, direction=['horizontal', 'vertical', 'diagonal']), dict(type='mmdet.PackDetInputs') ] val_pipeline = [ dict(type='mmdet.LoadImageFromFile', backend_args=backend_args), dict(type='mmdet.Resize', scale=(800, 800), keep_ratio=True),

avoid bboxes being resized

dict(type='mmdet.LoadAnnotations', with_bbox=True, box_type='qbox'),
dict(type='ConvertBoxType', box_type_mapping=dict(gt_bboxes='rbox')),
dict(
    type='mmdet.PackDetInputs',
    meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
               'scale_factor'))

] test_pipeline = [ dict(type='mmdet.LoadImageFromFile', backend_args=backend_args), dict(type='mmdet.Resize', scale=(800, 800), keep_ratio=True), dict( type='mmdet.PackDetInputs', meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'scale_factor')) ] train_dataloader = dict( batch_size=2, num_workers=2, persistent_workers=True, sampler=dict(type='DefaultSampler', shuffle=True), batch_sampler=None, dataset=dict( type='ConcatDataset', ignore_keys=['DATASET_TYPE'], datasets=[ dict( type=dataset_type, data_root=data_root, ann_file='ImageSets/Main/train.txt', data_prefix=dict(img_path='JPEGImages-trainval'), filter_cfg=dict(filter_empty_gt=True), pipeline=train_pipeline), dict( type=dataset_type, data_root=data_root, ann_file='ImageSets/Main/val.txt', data_prefix=dict(img_path='JPEGImages-trainval'), filter_cfg=dict(filter_empty_gt=True), pipeline=train_pipeline, backend_args=backend_args) ])) val_dataloader = dict( batch_size=1, num_workers=2, persistent_workers=True, drop_last=False, sampler=dict(type='DefaultSampler', shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, ann_file='ImageSets/Main/test.txt', data_prefix=dict(img_path='JPEGImages-test'), test_mode=True, pipeline=val_pipeline, backend_args=backend_args)) test_dataloader = val_dataloader

val_evaluator = dict(type='DOTAMetric', metric='mAP') test_evaluator = val_evaluator

Reproduces the problem - command or script

dataset settings

dataset_type = 'DIORDataset' data_root = '/work/data/datasets/DIOR/' backend_args = None

avoid bboxes being resized

dict(type='mmdet.LoadAnnotations', with_bbox=True, box_type='qbox'),
dict(type='ConvertBoxType', box_type_mapping=dict(gt_bboxes='rbox')),
dict(
    type='mmdet.PackDetInputs',
    meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
               'scale_factor'))

val_evaluator = dict(type='DOTAMetric', metric='mAP') test_evaluator = val_evaluator

Reproduces the problem - error message

I'm training oriented rcnn on dior dataset, but my loss of model is nan at epoch 7. The problem did not happen in DOTAv1.0 and DOTA1.5 datasets. How can I solve this problem?

Additional information

No response

open-mmlab / mmrotate

[Bug] The loss is nan. #1057

Prerequisite

Task

Branch

Environment

Reproduces the problem - code sample

dataset settings

avoid bboxes being resized

Reproduces the problem - command or script

dataset settings

avoid bboxes being resized

Reproduces the problem - error message

Additional information