open-mmlab / mmdetection

OpenMMLab Detection Toolbox and Benchmark
https://mmdetection.readthedocs.io
Apache License 2.0
29.51k stars 9.45k forks source link

I don't know why I can't train RTMDet-Ins #11128

Open Wooho-Moon opened 12 months ago

Wooho-Moon commented 12 months ago

There is no error. but, I can't train the model RTMDet-ins-m. After showing the log '11/06 10:12:07 - mmengine - INFO - Using SyncBatchNorm()', I may get the log like ' mmengine - INFO - Hooks will be executed in the following order:' . But, I can't. It is really wired since gpu usage is collect. I only changed the data path in coco_detection.py. What should I do?

my enviroments is followed.

RTX 4090 python 3.8 torch 2.0.1 cuad 11.8 mmdetection 3.2

CLI : CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_train.sh /root/workspace/DEVELOP_segmentation/based_mmopenlab/mmdetection/configs/rtmdet/rtmdet-ins_m_8xb32-300e_coco.py 4

The problem is followed:

FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use-env is set by default in torchrun. If your script expects --local-rank argument to be set, please change it to read from os.environ['LOCAL_RANK'] instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions

warnings.warn( WARNING:torch.distributed.run:


Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.


/root/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/utils/dl_utils/setup_env.py:56: UserWarning: Setting MKL_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. warnings.warn( /root/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/utils/dl_utils/setup_env.py:56: UserWarning: Setting MKL_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. warnings.warn( /root/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/utils/dl_utils/setup_env.py:56: UserWarning: Setting MKL_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. warnings.warn( /root/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/utils/dl_utils/setup_env.py:56: UserWarning: Setting MKL_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. warnings.warn( 11/06 10:12:02 - mmengine - INFO -

System environment: sys.platform: linux Python: 3.8.18 (default, Sep 11 2023, 13:40:15) [GCC 11.2.0] CUDA available: True numpy_random_seed: 177560996 GPU 0,1,2,3: NVIDIA Graphics Device CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 11.8, V11.8.89 GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 PyTorch: 2.0.1 PyTorch compiling details: PyTorch built with:

Runtime environment: cudnn_benchmark: False mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0} dist_cfg: {'backend': 'nccl'} seed: 177560996 Distributed launcher: pytorch Distributed training: True GPU number: 4

11/06 10:12:04 - mmengine - INFO - Config: auto_scale_lr = dict(base_batch_size=16, enable=False) backend_args = None base_lr = 0.004 custom_hooks = [ dict( ema_type='ExpMomentumEMA', momentum=0.0002, priority=49, type='EMAHook', update_buffers=True), dict( switch_epoch=280, switch_pipeline=[ dict(backend_args=None, type='LoadImageFromFile'), dict( poly2mask=False, type='LoadAnnotations', with_bbox=True, with_mask=True), dict( keep_ratio=True, ratio_range=( 0.1, 2.0, ), scale=( 640, 640, ), type='RandomResize'), dict( allow_negative_crop=True, crop_size=( 640, 640, ), recompute_bbox=True, type='RandomCrop'), dict(min_gt_bbox_wh=( 1, 1, ), type='FilterAnnotations'), dict(type='YOLOXHSVRandomAug'), dict(prob=0.5, type='RandomFlip'), dict( pad_val=dict(img=( 114, 114, 114, )), size=( 640, 640, ), type='Pad'), dict(type='PackDetInputs'), ], type='PipelineSwitchHook'), ] data_root = '/petco/dataset/instance_segmentation/open/coco/' dataset_type = 'CocoDataset' default_hooks = dict( checkpoint=dict(interval=10, max_keep_ckpts=3, type='CheckpointHook'), logger=dict(interval=50, type='LoggerHook'), param_scheduler=dict(type='ParamSchedulerHook'), sampler_seed=dict(type='DistSamplerSeedHook'), timer=dict(type='IterTimerHook'), visualization=dict(type='DetVisualizationHook')) default_scope = 'mmdet' env_cfg = dict( cudnn_benchmark=False, dist_cfg=dict(backend='nccl'), mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0)) img_scales = [ ( 640, 640, ), ( 320, 320, ), ( 960, 960, ), ] interval = 10 launcher = 'pytorch' load_from = None log_level = 'INFO' log_processor = dict(by_epoch=True, type='LogProcessor', window_size=50) max_epochs = 300 model = dict( backbone=dict( act_cfg=dict(inplace=True, type='SiLU'), arch='P5', channel_attention=True, deepen_factor=0.67, expand_ratio=0.5, frozen_stages=4, norm_cfg=dict(type='SyncBN'), type='CSPNeXt', widen_factor=0.75), bbox_head=dict( act_cfg=dict(inplace=True, type='SiLU'), anchor_generator=dict( offset=0, strides=[ 8, 16, 32, ], type='MlvlPointGenerator'), bbox_coder=dict(type='DistancePointBBoxCoder'), feat_channels=192, in_channels=192, loss_bbox=dict(loss_weight=2.0, type='GIoULoss'), loss_cls=dict( beta=2.0, loss_weight=1.0, type='QualityFocalLoss', use_sigmoid=True), loss_mask=dict( eps=5e-06, loss_weight=2.0, reduction='mean', type='DiceLoss'), norm_cfg=dict(requires_grad=True, type='SyncBN'), num_classes=80, pred_kernel_size=1, share_conv=True, stacked_convs=2, type='RTMDetInsSepBNHead'), data_preprocessor=dict( batch_augments=None, bgr_to_rgb=False, mean=[ 103.53, 116.28, 123.675, ], std=[ 57.375, 57.12, 58.395, ], type='DetDataPreprocessor'), init_cfg=dict( checkpoint= 'https://download.openmmlab.com/mmdetection/v3.0/rtmdet/rtmdet-ins_m_8xb32-300e_coco/rtmdet-ins_m_8xb32-300e_coco_20221123_001039-6eba602e.pth', type='Pretrained'), neck=dict( act_cfg=dict(inplace=True, type='SiLU'), expand_ratio=0.5, in_channels=[ 192, 384, 768, ], norm_cfg=dict(type='SyncBN'), num_csp_blocks=2, out_channels=192, type='CSPNeXtPAFPN'), test_cfg=dict( mask_thr_binary=0.5, max_per_img=100, min_bbox_size=0, nms=dict(iou_threshold=0.6, type='nms'), nms_pre=1000, score_thr=0.05), train_cfg=dict( allowed_border=-1, assigner=dict(topk=13, type='DynamicSoftLabelAssigner'), debug=False, pos_weight=-1), type='RTMDet') optim_wrapper = dict( optimizer=dict(lr=0.004, type='AdamW', weight_decay=0.05), paramwise_cfg=dict( bias_decay_mult=0, bypass_duplicate=True, norm_decay_mult=0), type='OptimWrapper') param_scheduler = [ dict( begin=0, by_epoch=False, end=1000, start_factor=1e-05, type='LinearLR'), dict( T_max=150, begin=150, by_epoch=True, convert_to_iter_based=True, end=300, eta_min=0.0002, type='CosineAnnealingLR'), ] resume = False stage2_num_epochs = 20 test_cfg = dict(type='TestLoop') test_dataloader = dict( batch_size=5, dataset=dict( ann_file='annotations/instances_val2017.json', backend_args=None, data_prefix=dict(img='val2017/'), data_root='/petco/dataset/instance_segmentation/open/coco/', pipeline=[ dict(backend_args=None, type='LoadImageFromFile'), dict(keep_ratio=True, scale=( 640, 640, ), type='Resize'), dict( pad_val=dict(img=( 114, 114, 114, )), size=( 640, 640, ), type='Pad'), dict(type='LoadAnnotations', with_bbox=True), dict( meta_keys=( 'img_id', 'img_path', 'ori_shape', 'img_shape', 'scale_factor', ), type='PackDetInputs'), ], test_mode=True, type='CocoDataset'), drop_last=False, num_workers=10, persistent_workers=True, sampler=dict(shuffle=False, type='DefaultSampler')) test_evaluator = dict( ann_file= '/petco/dataset/instance_segmentation/open/coco/annotations/instances_val2017.json', backend_args=None, format_only=False, metric=[ 'bbox', 'segm', ], proposal_nums=( 100, 1, 10, ), type='CocoMetric') test_pipeline = [ dict(backend_args=None, type='LoadImageFromFile'), dict(keep_ratio=True, scale=( 640, 640, ), type='Resize'), dict(pad_val=dict(img=( 114, 114, 114, )), size=( 640, 640, ), type='Pad'), dict(type='LoadAnnotations', with_bbox=True), dict( meta_keys=( 'img_id', 'img_path', 'ori_shape', 'img_shape', 'scale_factor', ), type='PackDetInputs'), ] train_cfg = dict( dynamic_intervals=[ ( 280, 1, ), ], max_epochs=300, type='EpochBasedTrainLoop', val_interval=10) train_dataloader = dict( batch_sampler=None, batch_size=32, dataset=dict( ann_file='annotations/instances_train2017.json', backend_args=None, data_prefix=dict(img='train2017/'), data_root='/petco/dataset/instance_segmentation/open/coco/', filter_cfg=dict(filter_empty_gt=True, min_size=32), pipeline=[ dict(backend_args=None, type='LoadImageFromFile'), dict( poly2mask=False, type='LoadAnnotations', with_bbox=True, with_mask=True), dict(img_scale=( 640, 640, ), pad_val=114.0, type='CachedMosaic'), dict( keep_ratio=True, ratio_range=( 0.1, 2.0, ), scale=( 1280, 1280, ), type='RandomResize'), dict( allow_negative_crop=True, crop_size=( 640, 640, ), recompute_bbox=True, type='RandomCrop'), dict(type='YOLOXHSVRandomAug'), dict(prob=0.5, type='RandomFlip'), dict( pad_val=dict(img=( 114, 114, 114, )), size=( 640, 640, ), type='Pad'), dict( img_scale=( 640, 640, ), max_cached_images=20, pad_val=( 114, 114, 114, ), ratio_range=( 1.0, 1.0, ), type='CachedMixUp'), dict(min_gt_bbox_wh=( 1, 1, ), type='FilterAnnotations'), dict(type='PackDetInputs'), ], type='CocoDataset'), num_workers=10, persistent_workers=True, pin_memory=True, sampler=dict(shuffle=True, type='DefaultSampler')) train_pipeline = [ dict(backend_args=None, type='LoadImageFromFile'), dict( poly2mask=False, type='LoadAnnotations', with_bbox=True, with_mask=True), dict(img_scale=( 640, 640, ), pad_val=114.0, type='CachedMosaic'), dict( keep_ratio=True, ratio_range=( 0.1, 2.0, ), scale=( 1280, 1280, ), type='RandomResize'), dict( allow_negative_crop=True, crop_size=( 640, 640, ), recompute_bbox=True, type='RandomCrop'), dict(type='YOLOXHSVRandomAug'), dict(prob=0.5, type='RandomFlip'), dict(pad_val=dict(img=( 114, 114, 114, )), size=( 640, 640, ), type='Pad'), dict( img_scale=( 640, 640, ), max_cached_images=20, pad_val=( 114, 114, 114, ), ratio_range=( 1.0, 1.0, ), type='CachedMixUp'), dict(min_gt_bbox_wh=( 1, 1, ), type='FilterAnnotations'), dict(type='PackDetInputs'), ] train_pipeline_stage2 = [ dict(backend_args=None, type='LoadImageFromFile'), dict( poly2mask=False, type='LoadAnnotations', with_bbox=True, with_mask=True), dict( keep_ratio=True, ratio_range=( 0.1, 2.0, ), scale=( 640, 640, ), type='RandomResize'), dict( allow_negative_crop=True, crop_size=( 640, 640, ), recompute_bbox=True, type='RandomCrop'), dict(min_gt_bbox_wh=( 1, 1, ), type='FilterAnnotations'), dict(type='YOLOXHSVRandomAug'), dict(prob=0.5, type='RandomFlip'), dict(pad_val=dict(img=( 114, 114, 114, )), size=( 640, 640, ), type='Pad'), dict(type='PackDetInputs'), ] tta_model = dict( tta_cfg=dict(max_per_img=100, nms=dict(iou_threshold=0.6, type='nms')), type='DetTTAModel') tta_pipeline = [ dict(backend_args=None, type='LoadImageFromFile'), dict( transforms=[ [ dict(keep_ratio=True, scale=( 640, 640, ), type='Resize'), dict(keep_ratio=True, scale=( 320, 320, ), type='Resize'), dict(keep_ratio=True, scale=( 960, 960, ), type='Resize'), ], [ dict(prob=1.0, type='RandomFlip'), dict(prob=0.0, type='RandomFlip'), ], [ dict( pad_val=dict(img=( 114, 114, 114, )), size=( 960, 960, ), type='Pad'), ], [ dict(type='LoadAnnotations', with_bbox=True), ], [ dict( meta_keys=( 'img_id', 'img_path', 'ori_shape', 'img_shape', 'scale_factor', 'flip', 'flip_direction', ), type='PackDetInputs'), ], ], type='TestTimeAug'), ] val_cfg = dict(type='ValLoop') val_dataloader = dict( batch_size=5, dataset=dict( ann_file='annotations/instances_val2017.json', backend_args=None, data_prefix=dict(img='val2017/'), data_root='/petco/dataset/instance_segmentation/open/coco/', pipeline=[ dict(backend_args=None, type='LoadImageFromFile'), dict(keep_ratio=True, scale=( 640, 640, ), type='Resize'), dict( pad_val=dict(img=( 114, 114, 114, )), size=( 640, 640, ), type='Pad'), dict(type='LoadAnnotations', with_bbox=True), dict( meta_keys=( 'img_id', 'img_path', 'ori_shape', 'img_shape', 'scale_factor', ), type='PackDetInputs'), ], test_mode=True, type='CocoDataset'), drop_last=False, num_workers=10, persistent_workers=True, sampler=dict(shuffle=False, type='DefaultSampler')) val_evaluator = dict( ann_file= '/petco/dataset/instance_segmentation/open/coco/annotations/instances_val2017.json', backend_args=None, format_only=False, metric=[ 'bbox', 'segm', ], proposal_nums=( 100, 1, 10, ), type='CocoMetric') vis_backends = [ dict(type='LocalVisBackend'), ] visualizer = dict( name='visualizer', type='DetLocalVisualizer', vis_backends=[ dict(type='LocalVisBackend'), ]) work_dir = './work_dirs/rtmdet-ins_m_8xb32-300e_coco'

11/06 10:12:07 - mmengine - INFO - Using SyncBatchNorm()

hhaAndroid commented 12 months ago

I didn't see any error messages

Wooho-Moon commented 12 months ago

I didn't see any error messages

Yes, you're right. There is no error but, I can't see any log after that final log. I can't progress other parts! :(

Wooho-Moon commented 12 months ago

I knew that what the problem is. I update a nvidia-graphic driver to 535. And the it was sloved.