Cannot reproduce lsknet_s_ss result on DOTA1.0

Prerequisite

[X] I have searched Issues and Discussions but cannot get the expected help.
[X] I have read the FAQ documentation but cannot get the expected help.
[X] The bug has not been fixed in the latest version (master) or latest version (1.x).

Task

I'm using the official example scripts/configs for the officially supported tasks/models/datasets.

Branch

master branch https://github.com/open-mmlab/mmrotate

Environment

sys.platform: linux Python: 3.7.16 (default, Jan 17 2023, 22:20:44) [GCC 11.2.0] CUDA available: True GPU 0,1,2,3,4,5,6,7: GeForce RTX 2080 Ti CUDA_HOME: /usr/local/cuda-10.1 NVCC: Cuda compilation tools, release 10.1, V10.1.10 GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 PyTorch: 1.9.0 PyTorch compiling details: PyTorch built with:

GCC 7.3
C++ Version: 201402
Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v2.1.2 (Git Hash 98be7e8afa711dc9b66c8ff3504129cb82013cdb)
OpenMP 201511 (a.k.a. OpenMP 4.5)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 10.2
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
CuDNN 7.6.5
Magma 2.5.2
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=10.2, CUDNN_VERSION=7.6.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.9.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.10.0 OpenCV: 4.8.1 MMCV: 1.5.3 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 10.2 MMRotate: 0.3.4+

Reproduces the problem - code sample

dataset_type = 'DOTADataset' data_root = 'data/split_ss_dota/' img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(type='RResize', img_scale=(1024, 1024)), dict( type='RRandomFlip', flip_ratio=[0.25, 0.25, 0.25], direction=['horizontal', 'vertical', 'diagonal'], version='le90'), dict( type='PolyRandomRotate', rotate_ratio=0.5, angles_range=180, auto_bound=False, rect_classes=[9, 11], version='le90'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) ] test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1024, 1024), flip=False, transforms=[ dict(type='RResize'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img']) ]) ] data = dict( samples_per_gpu=1, workers_per_gpu=2, train=dict( type='DOTADataset', ann_file='data/split_ss_dota/trainval/annfiles/', img_prefix='data/split_ss_dota/trainval/images/', pipeline=[ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(type='RResize', img_scale=(1024, 1024)), dict( type='RRandomFlip', flip_ratio=[0.25, 0.25, 0.25], direction=['horizontal', 'vertical', 'diagonal'], version='le90'), dict( type='PolyRandomRotate', rotate_ratio=0.5, angles_range=180, auto_bound=False, rect_classes=[9, 11], version='le90'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) ], version='le90'), val=dict( type='DOTADataset', ann_file='data/split_ss_dota/val/annfiles/', img_prefix='data/split_ss_dota/val/images/', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1024, 1024), flip=False, transforms=[ dict(type='RResize'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img']) ]) ], version='le90'), test=dict( type='DOTADataset', ann_file='data/split_ss_dota/test/images/', img_prefix='data/split_ss_dota/test/images/', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1024, 1024), flip=False, transforms=[ dict(type='RResize'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img']) ]) ], version='le90')) evaluation = dict(interval=1, metric='mAP') optimizer = dict(type='AdamW', lr=5e-05, betas=(0.9, 0.999), weight_decay=0.05) optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) lr_config = dict( policy='step', warmup='linear', warmup_iters=500, warmup_ratio=0.3333333333333333, step=[8, 11]) runner = dict(type='EpochBasedRunner', max_epochs=12) checkpoint_config = dict(interval=1) log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')]) dist_params = dict(backend='nccl') log_level = 'INFO' load_from = None resume_from = None workflow = [('train', 1)] opencv_num_threads = 0 mp_start_method = 'fork' angle_version = 'le90' gpu_number = 6 model = dict( type='OrientedRCNN', backbone=dict( type='LSKNet', embed_dims=[64, 128, 320, 512], drop_rate=0.1, drop_path_rate=0.1, depths=[2, 2, 4, 2], init_cfg=dict( type='Pretrained', checkpoint='/data/yiliu/lsknet/lsk_s_backbone-e9d2e551.pth'), norm_cfg=dict(type='SyncBN', requires_grad=True)), neck=dict( type='FPN', in_channels=[64, 128, 320, 512], out_channels=256, num_outs=5), rpn_head=dict( type='OrientedRPNHead', in_channels=256, feat_channels=256, version='le90', anchor_generator=dict( type='AnchorGenerator', scales=[8], ratios=[0.5, 1.0, 2.0], strides=[4, 8, 16, 32, 64]), bbox_coder=dict( type='MidpointOffsetCoder', angle_range='le90', target_means=[0.0, 0.0, 0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0, 0.5, 0.5]), loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), loss_bbox=dict( type='SmoothL1Loss', beta=0.1111111111111111, loss_weight=1.0)), roi_head=dict( type='OrientedStandardRoIHead', bbox_roi_extractor=dict( type='RotatedSingleRoIExtractor', roi_layer=dict( type='RoIAlignRotated', out_size=7, sample_num=2, clockwise=True), out_channels=256, featmap_strides=[4, 8, 16, 32]), bbox_head=dict( type='RotatedShared2FCBBoxHead', in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=15, bbox_coder=dict( type='DeltaXYWHAOBBoxCoder', angle_range='le90', norm_factor=None, edge_swap=True, proj_xy=True, target_means=(0.0, 0.0, 0.0, 0.0, 0.0), target_stds=(0.1, 0.1, 0.2, 0.2, 0.1)), reg_class_agnostic=True, loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0))), train_cfg=dict( rpn=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.7, neg_iou_thr=0.3, min_pos_iou=0.3, match_low_quality=True, gpu_assign_thr=800, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), allowed_border=0, pos_weight=-1, debug=False), rpn_proposal=dict( nms_pre=2000, max_per_img=2000, nms=dict(type='nms', iou_threshold=0.8), min_bbox_size=0), rcnn=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0.5, match_low_quality=False, iou_calculator=dict(type='RBboxOverlaps2D'), gpu_assign_thr=800, ignore_iof_thr=-1), sampler=dict( type='RRandomSampler', num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), pos_weight=-1, debug=False)), test_cfg=dict( rpn=dict( nms_pre=2000, max_per_img=2000, nms=dict(type='nms', iou_threshold=0.8), min_bbox_size=0), rcnn=dict( nms_pre=2000, min_bbox_size=0, score_thr=0.05, nms=dict(iou_thr=0.1), max_per_img=2000))) work_dir = 'lrs5' auto_resume = False gpu_ids = range(0, 6)

Reproduces the problem - command or script

tools/dist_train.sh configs/lsknet/lsk_s_fpn_1x_dota_le90.py 6

tools/dist_test.sh configs/lsknet/lsk_s_fpn_1x_dota_le90.py lys5 6 --format-only --eval-options submission_dir=work_dirss/Task1_results

Reproduces the problem - error message

20240328_012920.log

Additional information

I used the officially provided single-scale image split tool to get the ss dataset. But I cannot reproduce the result of lsknet_s with single-scale random rotate train and test. 1711603823318 Can anyone help me check my configuration? Thanks a lot!

Try lr=0.0001

ok i'll have a try. thank u!

Try lr=0.0001

i use the 'lsk_s_fpn_1x_dota_le90.py' and set the lr=0.0001 on 8 GPUs this time. but the map in validation_set reached its peak in epoch_9,and then droped.this is my training log, i'm confused about this 20240328_140356.log

If you are conducting single scale training, you may need to tune your lr higher. Set lr to around 0.0002 for 8 gpus may solve your issue.

Try lr=0.0001

ok i'll have a try. thank u!

sorry for the confusion. I noticed your setting was 6 gps, therefore the optimal lr for this setting may fall between 0.0001 and 0.00015 (based on my personal experience). However, if you set gpu to 8, you should try a higher lr instead. In this situation, 0.0002 may be optimal.

Hello, I'm also confused about the learning rate.

I'm using "lsk_s_ema_fpn_1x_dota_le90.py" for train on DOTAv1 dataset. And I noticed that you use 8 units of RTX 3090, but I only have 1 unit of RTX 3090 and 1 unit of RTX 4090. What learning rate can I set for reproduce the results? Or are there any parameters can I change for my experimental environment?

sys.platform: linux Python: 3.8.19 (default, Mar 20 2024, 19:58:24) [GCC 11.2.0] CUDA available: True GPU 0: NVIDIA GeForce RTX 4090 GPU 1: NVIDIA GeForce RTX 3090 CUDA_HOME: /usr/local/cuda-11.8 NVCC: Cuda compilation tools, release 11.8, V11.8.89 GCC: gcc (Ubuntu 11.1.0-1ubuntu1~18.04.1) 11.1.0 PyTorch: 2.0.1+cu118 PyTorch compiling details: PyTorch built with:

GCC 9.3
C++ Version: 201703
Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v2.7.3 (Git Hash 6dbeffbae1f23cbbeae17adb7b5b13f1f37c080e)
OpenMP 201511 (a.k.a. OpenMP 4.5)
LAPACK is enabled (usually provided by MKL)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 11.8
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
CuDNN 8.9.5
- Built with CuDNN 8.7
Magma 2.6.1
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.8, CUDNN_VERSION=8.7.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.0.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,

TorchVision: 0.15.2+cu118 OpenCV: 4.9.0 MMCV: 1.7.2 MMCV Compiler: GCC 9.3 MMCV CUDA Compiler: 11.8 MMRotate: 0.3.4+

Thanks a lot!

Hello, I'm also confused about the learning rate.

I'm using "lsk_s_ema_fpn_1x_dota_le90.py" for train on DOTAv1 dataset. And I noticed that you use 8 units of RTX 3090, but I only have 1 unit of RTX 3090 and 1 unit of RTX 4090. What learning rate can I set for reproduce the results? Or are there any parameters can I change for my experimental environment?

sys.platform: linux Python: 3.8.19 (default, Mar 20 2024, 19:58:24) [GCC 11.2.0] CUDA available: True GPU 0: NVIDIA GeForce RTX 4090 GPU 1: NVIDIA GeForce RTX 3090 CUDA_HOME: /usr/local/cuda-11.8 NVCC: Cuda compilation tools, release 11.8, V11.8.89 GCC: gcc (Ubuntu 11.1.0-1ubuntu1~18.04.1) 11.1.0 PyTorch: 2.0.1+cu118 PyTorch compiling details: PyTorch built with:

GCC 9.3

C++ Version: 201703

Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications

Intel(R) MKL-DNN v2.7.3 (Git Hash 6dbeffbae1f23cbbeae17adb7b5b13f1f37c080e)

OpenMP 201511 (a.k.a. OpenMP 4.5)

LAPACK is enabled (usually provided by MKL)

NNPACK is enabled

CPU capability usage: AVX2

CUDA Runtime 11.8

NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90

CuDNN 8.9.5

Built with CuDNN 8.7

Magma 2.6.1

Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.8, CUDNN_VERSION=8.7.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.0.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,

TorchVision: 0.15.2+cu118 OpenCV: 4.9.0 MMCV: 1.7.2 MMCV Compiler: GCC 9.3 MMCV CUDA Compiler: 11.8 MMRotate: 0.3.4+

Thanks a lot!

You may face the same situation as #43 . You can try to adjust the learning rate more or less to get better performance, but this may not compensate for the side effects of small batch size.

@Yiliud Hi, Where did you find the log file（ss）I only find the ms log. @zcablii

@Yiliud Hi, Where did you find the log file（ss）I only find the ms log. @zcablii

We did not report results on SS on any publication. But someone I know told me the result he got is around 78.2+. It have not been verified by myself yet, but u can take it as a rough reference.

zcablii / LSKNet