(mmrotatedev1toch112) WuMingrui@Turing14:~/Workspace/mmrotate-dev-1.x$ CUDA_VISIBLE_DEVICES=0,1 ./tools/dist_train.sh ./configs/rotated_faster_rcnn/rotated-faster-rcnn-le90_r50_fpn_1x_dota.py 2
/home/WuMingrui/miniconda3/envs/mmrotatedev1toch112/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects --local_rank argument to be set, please
change it to read from os.environ['LOCAL_RANK'] instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions
warnings.warn(
WARNING:torch.distributed.run:
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
/home/WuMingrui/miniconda3/envs/mmrotatedev1toch112/lib/python3.8/site-packages/mmengine/utils/dl_utils/setup_env.py:56: UserWarning: Setting MKL_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
warnings.warn(
/home/WuMingrui/miniconda3/envs/mmrotatedev1toch112/lib/python3.8/site-packages/mmengine/utils/dl_utils/setup_env.py:56: UserWarning: Setting MKL_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
warnings.warn(
01/30 13:08:57 - mmengine - INFO -
System environment:
sys.platform: linux
Python: 3.8.18 (default, Sep 11 2023, 13:40:15) [GCC 11.2.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 549113804
GPU 0,1: NVIDIA A100-SXM4-80GB
CUDA_HOME: /usr/local/cuda-11.6
NVCC: Cuda compilation tools, release 11.6, V11.6.124
GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.3) 9.4.0
PyTorch: 1.12.1
PyTorch compiling details: PyTorch built with:
GCC 9.3
C++ Version: 201402
Intel(R) oneAPI Math Kernel Library Version 2023.1-Product Build 20230303 for Intel(R) 64 architecture applications
01/30 13:14:23 - mmengine - WARNING - Failed to search registry with scope "mmrotate" in the "optim_wrapper" registry tree. As a workaround, the current "optim_wrapper" registry in "mmengine" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmrotate" is a correct scope, or whether the registry is initialized.
discoverable01/30 13:16:11 - mmengine - INFO - load model from: torchvision://resnet50
01/30 13:16:11 - mmengine - INFO - Loads checkpoint by torchvision backend from path: torchvision://resnet50
01/30 13:16:11 - mmengine - WARNING - The model and loaded state dict do not match exactly
unexpected key in source state_dict: fc.weight, fc.bias
01/30 13:16:12 - mmengine - WARNING - "FileClient" will be deprecated in future. Please use io functions in https://mmengine.readthedocs.io/en/latest/api/fileio.html#file-io
01/30 13:16:12 - mmengine - WARNING - "HardDiskBackend" is the alias of "LocalBackend" and the former will be deprecated in future.
01/30 13:16:12 - mmengine - INFO - Checkpoints will be saved to /media/Raid/WuMingrui/mmrotate-dev-1.x/work_dirs/r50_test.
/media/Raid/WuMingrui/mmrotate-dev-1.x/mmrotate/structures/bbox/rotated_boxes.py:192: UserWarning: The clip function does nothing in RotatedBoxes.
warnings.warn('The clip function does nothing in RotatedBoxes.')
/media/Raid/WuMingrui/mmrotate-dev-1.x/mmrotate/structures/bbox/rotated_boxes.py:192: UserWarning: The clip function does nothing in RotatedBoxes.
warnings.warn('The clip function does nothing in RotatedBoxes.')
/media/Raid/WuMingrui/mmrotate-dev-1.x/mmrotate/structures/bbox/rotated_boxes.py:192: UserWarning: The clip function does nothing in RotatedBoxes.
warnings.warn('The clip function does nothing in RotatedBoxes.')
/media/Raid/WuMingrui/mmrotate-dev-1.x/mmrotate/structures/bbox/rotated_boxes.py:192: UserWarning: The clip function does nothing in RotatedBoxes.
warnings.warn('The clip function does nothing in RotatedBoxes.')
/media/Raid/WuMingrui/mmrotate-dev-1.x/mmrotate/structures/bbox/rotated_boxes.py:192: UserWarning: The clip function does nothing in RotatedBoxes.
warnings.warn('The clip function does nothing in RotatedBoxes.')
/media/Raid/WuMingrui/mmrotate-dev-1.x/mmrotate/structures/bbox/rotated_boxes.py:192: UserWarning: The clip function does nothing in RotatedBoxes.
warnings.warn('The clip function does nothing in RotatedBoxes.')
/media/Raid/WuMingrui/mmrotate-dev-1.x/mmrotate/structures/bbox/rotated_boxes.py:192: UserWarning: The clip function does nothing in RotatedBoxes.
warnings.warn('The clip function does nothing in RotatedBoxes.')
/media/Raid/WuMingrui/mmrotate-dev-1.x/mmrotate/structures/bbox/rotated_boxes.py:192: UserWarning: The clip function does nothing in RotatedBoxes.
warnings.warn('The clip function does nothing in RotatedBoxes.')
01/30 13:16:31 - mmengine - INFO - Epoch(train) [1][ 50/1285] lr: 7.9760e-03 eta: 1:38:12 time: 0.3834 data_time: 0.0137 memory: 8161 grad_norm: 4.2167 loss: 1.2448 loss_rpn_cls: 0.4039 loss_rpn_bbox: 0.0778 loss_cls: 0.4934 acc: 99.3652 loss_bbox: 0.2697
01/30 13:16:40 - mmengine - INFO - Epoch(train) [1][ 100/1285] lr: 9.3120e-03 eta: 1:13:16 time: 0.1907 data_time: 0.0071 memory: 6887 grad_norm: nan loss: nan loss_rpn_cls: nan loss_rpn_bbox: nan loss_cls: nan acc: 1.6393 loss_bbox: nan
01/30 13:16:49 - mmengine - INFO - Epoch(train) [1][ 150/1285] lr: 1.0648e-02 eta: 1:03:27 time: 0.1740 data_time: 0.0075 memory: 11067 grad_norm: nan loss: nan loss_rpn_cls: nan loss_rpn_bbox: nan loss_cls: nan acc: 27.2727 loss_bbox: nan
01/30 13:16:58 - mmengine - INFO - Epoch(train) [1][ 200/1285] lr: 1.1984e-02 eta: 0:58:24 time: 0.1731 data_time: 0.0073 memory: 7535 grad_norm: nan loss: nan loss_rpn_cls: nan loss_rpn_bbox: nan loss_cls: nan acc: 12.1212 loss_bbox: nan
01/30 13:17:06 - mmengine - INFO - Epoch(train) [1][ 250/1285] lr: 1.3320e-02 eta: 0:55:23 time: 0.1742 data_time: 0.0068 memory: 7676 grad_norm: nan loss: nan loss_rpn_cls: nan loss_rpn_bbox: nan loss_cls: nan acc: 48.1481 loss_bbox: nan
01/30 13:17:15 - mmengine - INFO - Epoch(train) [1][ 300/1285] lr: 1.4656e-02 eta: 0:53:18 time: 0.1738 data_time: 0.0071 memory: 7579 grad_norm: nan loss: nan loss_rpn_cls: nan loss_rpn_bbox: nan loss_cls: nan acc: 0.0000 loss_bbox: nan
01/30 13:17:24 - mmengine - INFO - Epoch(train) [1][ 350/1285] lr: 1.5992e-02 eta: 0:51:48 time: 0.1750 data_time: 0.0070 memory: 9612 grad_norm: nan loss: nan loss_rpn_cls: nan loss_rpn_bbox: nan loss_cls: nan acc: 43.8202 loss_bbox: nan
01/30 13:17:32 - mmengine - INFO - Epoch(train) [1][ 400/1285] lr: 1.7328e-02 eta: 0:50:39 time: 0.1749 data_time: 0.0071 memory: 8811 grad_norm: nan loss: nan loss_rpn_cls: nan loss_rpn_bbox: nan loss_cls: nan acc: 11.1111 loss_bbox: nan
Additional information
I tried mmrotate0.3.4 on the same dataset and it worked well. Then I tried rotated retinanet on mmrotatedev1.x, it still works well.
I also tried to decrease LR, but the same problem happened.
I suspect there may be some problem with my environment but cannot figure it out, which is CUDA 11.6 Pytorch1.12.1 MMEngine0.10.3 mmcv2.0.1 mmdet3.0.0rc6 mmrotate1.0.0rc1
Prerequisite
Task
I'm using the official example scripts/configs for the officially supported tasks/models/datasets.
Branch
1.x branch https://github.com/open-mmlab/mmrotate/tree/1.x
Environment
sys.platform: linux Python: 3.8.18 (default, Sep 11 2023, 13:40:15) [GCC 11.2.0] CUDA available: True MUSA available: False numpy_random_seed: 2147483648 GPU 0,1,2,3,4,5,6: NVIDIA A100-SXM4-80GB CUDA_HOME: /usr/local/cuda-11.6 NVCC: Cuda compilation tools, release 11.6, V11.6.124 GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.3) 9.4.0 PyTorch: 1.12.1 PyTorch compiling details: PyTorch built with:
TorchVision: 0.13.1 OpenCV: 4.9.0 MMEngine: 0.10.3 MMRotate: 1.0.0rc1+
Reproduces the problem - code sample
base = [ '../base/datasets/dota_my.py', '../base/schedules/schedule_1x.py', '../base/default_runtime.py' ]
angle_version = 'le90' model = dict( type='mmdet.FasterRCNN', data_preprocessor=dict( type='mmdet.DetDataPreprocessor', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_size_divisor=32, boxtype2tensor=False), backbone=dict( type='mmdet.ResNet', depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, norm_cfg=dict(type='BN', requires_grad=True), norm_eval=True, style='pytorch', init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), neck=dict( type='mmdet.FPN', in_channels=[256, 512, 1024, 2048], out_channels=256, num_outs=5), rpn_head=dict( type='mmdet.RPNHead', in_channels=256, feat_channels=256, anchor_generator=dict( type='mmdet.AnchorGenerator', scales=[8], ratios=[0.5, 1.0, 2.0], strides=[4, 8, 16, 32, 64], use_box_type=True), bbox_coder=dict( type='DeltaXYWHHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0], use_box_type=True), loss_cls=dict( type='mmdet.CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), loss_bbox=dict( type='mmdet.SmoothL1Loss', beta=0.1111111111111111, loss_weight=1.0)), roi_head=dict( type='mmdet.StandardRoIHead', bbox_roi_extractor=dict( type='mmdet.SingleRoIExtractor', roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0), out_channels=256, featmap_strides=[4, 8, 16, 32]), bbox_head=dict( type='mmdet.Shared2FCBBoxHead', predict_box_type='rbox', in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=15, reg_predictor_cfg=dict(type='mmdet.Linear'), cls_predictor_cfg=dict(type='mmdet.Linear'), bbox_coder=dict( type='DeltaXYWHTHBBoxCoder', angle_version=angle_version, norm_factor=2, edge_swap=True, target_means=(.0, .0, .0, .0, .0), target_stds=(0.1, 0.1, 0.2, 0.2, 0.1)), reg_class_agnostic=True, loss_cls=dict( type='mmdet.CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), loss_bbox=dict( type='mmdet.SmoothL1Loss', beta=1.0, loss_weight=1.0))), train_cfg=dict( rpn=dict( assigner=dict( type='mmdet.MaxIoUAssigner', pos_iou_thr=0.7, neg_iou_thr=0.3, min_pos_iou=0.3, match_low_quality=True, ignore_iof_thr=-1, iou_calculator=dict(type='RBbox2HBboxOverlaps2D')), sampler=dict( type='mmdet.RandomSampler', num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), allowed_border=0, pos_weight=-1, debug=False), rpn_proposal=dict( nms_pre=2000, max_per_img=2000, nms=dict(type='nms', iou_threshold=0.7), min_bbox_size=0), rcnn=dict( assigner=dict( type='mmdet.MaxIoUAssigner', pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0.5, match_low_quality=False, ignore_iof_thr=-1, iou_calculator=dict(type='RBbox2HBboxOverlaps2D')), sampler=dict( type='mmdet.RandomSampler', num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), pos_weight=-1, debug=False)), test_cfg=dict( rpn=dict( nms_pre=2000, max_per_img=2000, nms=dict(type='nms', iou_threshold=0.7), min_bbox_size=0), rcnn=dict( nms_pre=2000, min_bbox_size=0, score_thr=0.05, nms=dict(type='nms_rotated', iou_threshold=0.1), max_per_img=2000)))
optim_wrapper = dict( type='OptimWrapper', optimizer=dict(type='SGD', lr=0.020, momentum=0.9, weight_decay=0.0001), clip_grad=dict(max_norm=35, norm_type=2))
added config
train_dataloader = dict( batch_size=4, num_workers=4)
work_dir = 'work_dirs/r50_test/'
test_evaluator = dict( outfile_prefix='./work_dirs/r50_test')
Reproduces the problem - command or script
CUDA_VISIBLE_DEVICES=0,1 ./tools/dist_train.sh ./configs/rotated_faster_rcnn/rotated-faster-rcnn-le90_r50_fpn_1x_dota.py 2
Reproduces the problem - error message
(mmrotatedev1toch112) WuMingrui@Turing14:~/Workspace/mmrotate-dev-1.x$ CUDA_VISIBLE_DEVICES=0,1 ./tools/dist_train.sh ./configs/rotated_faster_rcnn/rotated-faster-rcnn-le90_r50_fpn_1x_dota.py 2 /home/WuMingrui/miniconda3/envs/mmrotatedev1toch112/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects
--local_rank
argument to be set, please change it to read fromos.environ['LOCAL_RANK']
instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructionswarnings.warn( WARNING:torch.distributed.run:
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
/home/WuMingrui/miniconda3/envs/mmrotatedev1toch112/lib/python3.8/site-packages/mmengine/utils/dl_utils/setup_env.py:56: UserWarning: Setting MKL_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. warnings.warn( /home/WuMingrui/miniconda3/envs/mmrotatedev1toch112/lib/python3.8/site-packages/mmengine/utils/dl_utils/setup_env.py:56: UserWarning: Setting MKL_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. warnings.warn( 01/30 13:08:57 - mmengine - INFO -
System environment: sys.platform: linux Python: 3.8.18 (default, Sep 11 2023, 13:40:15) [GCC 11.2.0] CUDA available: True MUSA available: False numpy_random_seed: 549113804 GPU 0,1: NVIDIA A100-SXM4-80GB CUDA_HOME: /usr/local/cuda-11.6 NVCC: Cuda compilation tools, release 11.6, V11.6.124 GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.3) 9.4.0 PyTorch: 1.12.1 PyTorch compiling details: PyTorch built with:
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.6, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.12.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,
TorchVision: 0.13.1 OpenCV: 4.9.0 MMEngine: 0.10.3
Runtime environment: cudnn_benchmark: False mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0} dist_cfg: {'backend': 'nccl'} seed: 549113804 Distributed launcher: pytorch Distributed training: True GPU number: 2
01/30 13:08:57 - mmengine - INFO - Config: angle_version = 'le90' backend_args = None data_root = '../DOTASplit/' dataset_type = 'DOTADataset' default_hooks = dict( checkpoint=dict(interval=1, type='CheckpointHook'), logger=dict(interval=50, type='LoggerHook'), param_scheduler=dict(type='ParamSchedulerHook'), sampler_seed=dict(type='DistSamplerSeedHook'), timer=dict(type='IterTimerHook'), visualization=dict(type='mmdet.DetVisualizationHook')) default_scope = 'mmrotate' env_cfg = dict( cudnn_benchmark=False, dist_cfg=dict(backend='nccl'), mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0)) launcher = 'pytorch' load_from = None log_level = 'INFO' log_processor = dict(by_epoch=True, type='LogProcessor', window_size=50) model = dict( backbone=dict( depth=50, frozen_stages=1, init_cfg=dict(checkpoint='torchvision://resnet50', type='Pretrained'), norm_cfg=dict(requires_grad=True, type='BN'), norm_eval=True, num_stages=4, out_indices=( 0, 1, 2, 3, ), style='pytorch', type='mmdet.ResNet'), data_preprocessor=dict( bgr_to_rgb=True, boxtype2tensor=False, mean=[ 123.675, 116.28, 103.53, ], pad_size_divisor=32, std=[ 58.395, 57.12, 57.375, ], type='mmdet.DetDataPreprocessor'), neck=dict( in_channels=[ 256, 512, 1024, 2048, ], num_outs=5, out_channels=256, type='mmdet.FPN'), roi_head=dict( bbox_head=dict( bbox_coder=dict( angle_version='le90', edge_swap=True, norm_factor=2, target_means=( 0.0, 0.0, 0.0, 0.0, 0.0, ), target_stds=( 0.1, 0.1, 0.2, 0.2, 0.1, ), type='DeltaXYWHTHBBoxCoder'), cls_predictor_cfg=dict(type='mmdet.Linear'), fc_out_channels=1024, in_channels=256, loss_bbox=dict( beta=1.0, loss_weight=1.0, type='mmdet.SmoothL1Loss'), loss_cls=dict( loss_weight=1.0, type='mmdet.CrossEntropyLoss', use_sigmoid=False), num_classes=15, predict_box_type='rbox', reg_class_agnostic=True, reg_predictor_cfg=dict(type='mmdet.Linear'), roi_feat_size=7, type='mmdet.Shared2FCBBoxHead'), bbox_roi_extractor=dict( featmap_strides=[ 4, 8, 16, 32, ], out_channels=256, roi_layer=dict(output_size=7, sampling_ratio=0, type='RoIAlign'), type='mmdet.SingleRoIExtractor'), type='mmdet.StandardRoIHead'), rpn_head=dict( anchor_generator=dict( ratios=[ 0.5, 1.0, 2.0, ], scales=[ 8, ], strides=[ 4, 8, 16, 32, 64, ], type='mmdet.AnchorGenerator', use_box_type=True), bbox_coder=dict( target_means=[ 0.0, 0.0, 0.0, 0.0, ], target_stds=[ 1.0, 1.0, 1.0, 1.0, ], type='DeltaXYWHHBBoxCoder', use_box_type=True), feat_channels=256, in_channels=256, loss_bbox=dict( beta=0.1111111111111111, loss_weight=1.0, type='mmdet.SmoothL1Loss'), loss_cls=dict( loss_weight=1.0, type='mmdet.CrossEntropyLoss', use_sigmoid=True), type='mmdet.RPNHead'), test_cfg=dict( rcnn=dict( max_per_img=2000, min_bbox_size=0, nms=dict(iou_threshold=0.1, type='nms_rotated'), nms_pre=2000, score_thr=0.05), rpn=dict( max_per_img=2000, min_bbox_size=0, nms=dict(iou_threshold=0.7, type='nms'), nms_pre=2000)), train_cfg=dict( rcnn=dict( assigner=dict( ignore_iof_thr=-1, iou_calculator=dict(type='RBbox2HBboxOverlaps2D'), match_low_quality=False, min_pos_iou=0.5, neg_iou_thr=0.5, pos_iou_thr=0.5, type='mmdet.MaxIoUAssigner'), debug=False, pos_weight=-1, sampler=dict( add_gt_as_proposals=True, neg_pos_ub=-1, num=512, pos_fraction=0.25, type='mmdet.RandomSampler')), rpn=dict( allowed_border=0, assigner=dict( ignore_iof_thr=-1, iou_calculator=dict(type='RBbox2HBboxOverlaps2D'), match_low_quality=True, min_pos_iou=0.3, neg_iou_thr=0.3, pos_iou_thr=0.7, type='mmdet.MaxIoUAssigner'), debug=False, pos_weight=-1, sampler=dict( add_gt_as_proposals=False, neg_pos_ub=-1, num=256, pos_fraction=0.5, type='mmdet.RandomSampler')), rpn_proposal=dict( max_per_img=2000, min_bbox_size=0, nms=dict(iou_threshold=0.7, type='nms'), nms_pre=2000)), type='mmdet.FasterRCNN') optim_wrapper = dict( clip_grad=dict(max_norm=35, norm_type=2), optimizer=dict(lr=0.02, momentum=0.9, type='SGD', weight_decay=0.0001), type='OptimWrapper') param_scheduler = [ dict( begin=0, by_epoch=False, end=500, start_factor=0.3333333333333333, type='LinearLR'), dict( begin=0, by_epoch=True, end=12, gamma=0.1, milestones=[ 8, 11, ], type='MultiStepLR'), ] resume = False test_cfg = dict(type='TestLoop') test_dataloader = dict( batch_size=1, dataset=dict( data_prefix=dict(img_path='val/images/'), data_root='../DOTASplit/', pipeline=[ dict(backend_args=None, type='mmdet.LoadImageFromFile'), dict(keep_ratio=True, scale=( 1024, 1024, ), type='mmdet.Resize'), dict( meta_keys=( 'img_id', 'img_path', 'ori_shape', 'img_shape', 'scale_factor', ), type='mmdet.PackDetInputs'), ], test_mode=True, type='DOTADataset'), drop_last=False, num_workers=2, persistent_workers=True, sampler=dict(shuffle=False, type='DefaultSampler')) test_evaluator = dict( format_only=True, merge_patches=True, outfile_prefix='./work_dirs/r50_test', type='DOTAMetric') test_pipeline = [ dict(backend_args=None, type='mmdet.LoadImageFromFile'), dict(keep_ratio=True, scale=( 1024, 1024, ), type='mmdet.Resize'), dict( meta_keys=( 'img_id', 'img_path', 'ori_shape', 'img_shape', 'scale_factor', ), type='mmdet.PackDetInputs'), ] train_cfg = dict(max_epochs=12, type='EpochBasedTrainLoop', val_interval=1) train_dataloader = dict( batch_sampler=None, batch_size=4, dataset=dict( ann_file='train/labelTxt/', data_prefix=dict(img_path='train/images/'), data_root='../DOTASplit/', filter_cfg=dict(filter_empty_gt=True), pipeline=[ dict(backend_args=None, type='mmdet.LoadImageFromFile'), dict( box_type='qbox', type='mmdet.LoadAnnotations', with_bbox=True), dict( box_type_mapping=dict(gt_bboxes='rbox'), type='ConvertBoxType'), dict(keep_ratio=True, scale=( 1024, 1024, ), type='mmdet.Resize'), dict( direction=[ 'horizontal', 'vertical', 'diagonal', ], prob=0.75, type='mmdet.RandomFlip'), dict(type='mmdet.PackDetInputs'), ], type='DOTADataset'), num_workers=4, persistent_workers=True, sampler=dict(shuffle=True, type='DefaultSampler')) train_pipeline = [ dict(backend_args=None, type='mmdet.LoadImageFromFile'), dict(box_type='qbox', type='mmdet.LoadAnnotations', with_bbox=True), dict(box_type_mapping=dict(gt_bboxes='rbox'), type='ConvertBoxType'), dict(keep_ratio=True, scale=( 1024, 1024, ), type='mmdet.Resize'), dict( direction=[ 'horizontal', 'vertical', 'diagonal', ], prob=0.75, type='mmdet.RandomFlip'), dict(type='mmdet.PackDetInputs'), ] val_cfg = dict(type='ValLoop') val_dataloader = dict( batch_size=1, dataset=dict( ann_file='val/labelTxt/', data_prefix=dict(img_path='val/images/'), data_root='../DOTASplit/', pipeline=[ dict(backend_args=None, type='mmdet.LoadImageFromFile'), dict(keep_ratio=True, scale=( 1024, 1024, ), type='mmdet.Resize'), dict( box_type='qbox', type='mmdet.LoadAnnotations', with_bbox=True), dict( box_type_mapping=dict(gt_bboxes='rbox'), type='ConvertBoxType'), dict( meta_keys=( 'img_id', 'img_path', 'ori_shape', 'img_shape', 'scale_factor', ), type='mmdet.PackDetInputs'), ], test_mode=True, type='DOTADataset'), drop_last=False, num_workers=2, persistent_workers=True, sampler=dict(shuffle=False, type='DefaultSampler')) val_evaluator = dict(metric='mAP', type='DOTAMetric') val_pipeline = [ dict(backend_args=None, type='mmdet.LoadImageFromFile'), dict(keep_ratio=True, scale=( 1024, 1024, ), type='mmdet.Resize'), dict(box_type='qbox', type='mmdet.LoadAnnotations', with_bbox=True), dict(box_type_mapping=dict(gt_bboxes='rbox'), type='ConvertBoxType'), dict( meta_keys=( 'img_id', 'img_path', 'ori_shape', 'img_shape', 'scale_factor', ), type='mmdet.PackDetInputs'), ] vis_backends = [ dict(type='LocalVisBackend'), ] visualizer = dict( name='visualizer', type='RotLocalVisualizer', vis_backends=[ dict(type='LocalVisBackend'), ]) work_dir = 'work_dirs/r50_test/'
01/30 13:09:01 - mmengine - INFO - Hooks will be executed in the following order: before_run: (VERY_HIGH ) RuntimeInfoHook (BELOW_NORMAL) LoggerHook
before_train: (VERY_HIGH ) RuntimeInfoHook (NORMAL ) IterTimerHook (VERY_LOW ) CheckpointHook
before_train_epoch: (VERY_HIGH ) RuntimeInfoHook (NORMAL ) IterTimerHook (NORMAL ) DistSamplerSeedHook
before_train_iter: (VERY_HIGH ) RuntimeInfoHook (NORMAL ) IterTimerHook
after_train_iter: (VERY_HIGH ) RuntimeInfoHook (NORMAL ) IterTimerHook (BELOW_NORMAL) LoggerHook (LOW ) ParamSchedulerHook (VERY_LOW ) CheckpointHook
after_train_epoch: (NORMAL ) IterTimerHook (LOW ) ParamSchedulerHook (VERY_LOW ) CheckpointHook
before_val: (VERY_HIGH ) RuntimeInfoHook
before_val_epoch: (NORMAL ) IterTimerHook
before_val_iter: (NORMAL ) IterTimerHook
after_val_iter: (NORMAL ) IterTimerHook (NORMAL ) DetVisualizationHook (BELOW_NORMAL) LoggerHook
after_val_epoch: (VERY_HIGH ) RuntimeInfoHook (NORMAL ) IterTimerHook (BELOW_NORMAL) LoggerHook (LOW ) ParamSchedulerHook (VERY_LOW ) CheckpointHook
after_val: (VERY_HIGH ) RuntimeInfoHook
after_train: (VERY_HIGH ) RuntimeInfoHook (VERY_LOW ) CheckpointHook
before_test: (VERY_HIGH ) RuntimeInfoHook
before_test_epoch: (NORMAL ) IterTimerHook
before_test_iter: (NORMAL ) IterTimerHook
after_test_iter: (NORMAL ) IterTimerHook (NORMAL ) DetVisualizationHook (BELOW_NORMAL) LoggerHook
after_test_epoch: (VERY_HIGH ) RuntimeInfoHook (NORMAL ) IterTimerHook (BELOW_NORMAL) LoggerHook
after_test: (VERY_HIGH ) RuntimeInfoHook
after_run: (BELOW_NORMAL) LoggerHook
01/30 13:14:23 - mmengine - WARNING - Failed to search registry with scope "mmrotate" in the "optim_wrapper" registry tree. As a workaround, the current "optim_wrapper" registry in "mmengine" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmrotate" is a correct scope, or whether the registry is initialized. discoverable01/30 13:16:11 - mmengine - INFO - load model from: torchvision://resnet50 01/30 13:16:11 - mmengine - INFO - Loads checkpoint by torchvision backend from path: torchvision://resnet50 01/30 13:16:11 - mmengine - WARNING - The model and loaded state dict do not match exactly
unexpected key in source state_dict: fc.weight, fc.bias
01/30 13:16:12 - mmengine - WARNING - "FileClient" will be deprecated in future. Please use io functions in https://mmengine.readthedocs.io/en/latest/api/fileio.html#file-io 01/30 13:16:12 - mmengine - WARNING - "HardDiskBackend" is the alias of "LocalBackend" and the former will be deprecated in future. 01/30 13:16:12 - mmengine - INFO - Checkpoints will be saved to /media/Raid/WuMingrui/mmrotate-dev-1.x/work_dirs/r50_test. /media/Raid/WuMingrui/mmrotate-dev-1.x/mmrotate/structures/bbox/rotated_boxes.py:192: UserWarning: The
clip
function does nothing inRotatedBoxes
. warnings.warn('Theclip
function does nothing inRotatedBoxes
.') /media/Raid/WuMingrui/mmrotate-dev-1.x/mmrotate/structures/bbox/rotated_boxes.py:192: UserWarning: Theclip
function does nothing inRotatedBoxes
. warnings.warn('Theclip
function does nothing inRotatedBoxes
.') /media/Raid/WuMingrui/mmrotate-dev-1.x/mmrotate/structures/bbox/rotated_boxes.py:192: UserWarning: Theclip
function does nothing inRotatedBoxes
. warnings.warn('Theclip
function does nothing inRotatedBoxes
.') /media/Raid/WuMingrui/mmrotate-dev-1.x/mmrotate/structures/bbox/rotated_boxes.py:192: UserWarning: Theclip
function does nothing inRotatedBoxes
. warnings.warn('Theclip
function does nothing inRotatedBoxes
.') /media/Raid/WuMingrui/mmrotate-dev-1.x/mmrotate/structures/bbox/rotated_boxes.py:192: UserWarning: Theclip
function does nothing inRotatedBoxes
. warnings.warn('Theclip
function does nothing inRotatedBoxes
.') /media/Raid/WuMingrui/mmrotate-dev-1.x/mmrotate/structures/bbox/rotated_boxes.py:192: UserWarning: Theclip
function does nothing inRotatedBoxes
. warnings.warn('Theclip
function does nothing inRotatedBoxes
.') /media/Raid/WuMingrui/mmrotate-dev-1.x/mmrotate/structures/bbox/rotated_boxes.py:192: UserWarning: Theclip
function does nothing inRotatedBoxes
. warnings.warn('Theclip
function does nothing inRotatedBoxes
.') /media/Raid/WuMingrui/mmrotate-dev-1.x/mmrotate/structures/bbox/rotated_boxes.py:192: UserWarning: Theclip
function does nothing inRotatedBoxes
. warnings.warn('Theclip
function does nothing inRotatedBoxes
.') 01/30 13:16:31 - mmengine - INFO - Epoch(train) [1][ 50/1285] lr: 7.9760e-03 eta: 1:38:12 time: 0.3834 data_time: 0.0137 memory: 8161 grad_norm: 4.2167 loss: 1.2448 loss_rpn_cls: 0.4039 loss_rpn_bbox: 0.0778 loss_cls: 0.4934 acc: 99.3652 loss_bbox: 0.2697 01/30 13:16:40 - mmengine - INFO - Epoch(train) [1][ 100/1285] lr: 9.3120e-03 eta: 1:13:16 time: 0.1907 data_time: 0.0071 memory: 6887 grad_norm: nan loss: nan loss_rpn_cls: nan loss_rpn_bbox: nan loss_cls: nan acc: 1.6393 loss_bbox: nan 01/30 13:16:49 - mmengine - INFO - Epoch(train) [1][ 150/1285] lr: 1.0648e-02 eta: 1:03:27 time: 0.1740 data_time: 0.0075 memory: 11067 grad_norm: nan loss: nan loss_rpn_cls: nan loss_rpn_bbox: nan loss_cls: nan acc: 27.2727 loss_bbox: nan 01/30 13:16:58 - mmengine - INFO - Epoch(train) [1][ 200/1285] lr: 1.1984e-02 eta: 0:58:24 time: 0.1731 data_time: 0.0073 memory: 7535 grad_norm: nan loss: nan loss_rpn_cls: nan loss_rpn_bbox: nan loss_cls: nan acc: 12.1212 loss_bbox: nan 01/30 13:17:06 - mmengine - INFO - Epoch(train) [1][ 250/1285] lr: 1.3320e-02 eta: 0:55:23 time: 0.1742 data_time: 0.0068 memory: 7676 grad_norm: nan loss: nan loss_rpn_cls: nan loss_rpn_bbox: nan loss_cls: nan acc: 48.1481 loss_bbox: nan 01/30 13:17:15 - mmengine - INFO - Epoch(train) [1][ 300/1285] lr: 1.4656e-02 eta: 0:53:18 time: 0.1738 data_time: 0.0071 memory: 7579 grad_norm: nan loss: nan loss_rpn_cls: nan loss_rpn_bbox: nan loss_cls: nan acc: 0.0000 loss_bbox: nan 01/30 13:17:24 - mmengine - INFO - Epoch(train) [1][ 350/1285] lr: 1.5992e-02 eta: 0:51:48 time: 0.1750 data_time: 0.0070 memory: 9612 grad_norm: nan loss: nan loss_rpn_cls: nan loss_rpn_bbox: nan loss_cls: nan acc: 43.8202 loss_bbox: nan 01/30 13:17:32 - mmengine - INFO - Epoch(train) [1][ 400/1285] lr: 1.7328e-02 eta: 0:50:39 time: 0.1749 data_time: 0.0071 memory: 8811 grad_norm: nan loss: nan loss_rpn_cls: nan loss_rpn_bbox: nan loss_cls: nan acc: 11.1111 loss_bbox: nanAdditional information
I tried mmrotate0.3.4 on the same dataset and it worked well. Then I tried rotated retinanet on mmrotatedev1.x, it still works well. I also tried to decrease LR, but the same problem happened. I suspect there may be some problem with my environment but cannot figure it out, which is CUDA 11.6 Pytorch1.12.1 MMEngine0.10.3 mmcv2.0.1 mmdet3.0.0rc6 mmrotate1.0.0rc1