Closed 2276924877 closed 1 year ago
it seems that you perform evaluation after each epoch, please evaluate the performance when the training completed by adding the following config: evaluation = dict(interval=12, metric='mAP')
请问楼主解决问题了吗??请问一下作者我设置的是evaluation = dict(interval=12, metric='mAP'),在训练完成后进行的评估,一直卡着不动,不管是验证还是最终的测试,我是在整个数据集上进行的,第一张图片是验证时,第二张图片是测试时。
please provide more information such as model config and hardware platform information.
please provide more information such as model config and hardware platform information.
sys.platform: linux Python: 3.7.12 | packaged by conda-forge | (default, Oct 26 2021, 06:08:21) [GCC 9.4.0] CUDA available: True GPU 0: NVIDIA GeForce RTX 3090 CUDA_HOME: /usr NVCC: Cuda compilation tools, release 11.5, V11.5.119 GCC: gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 PyTorch: 1.10.0+cu113 PyTorch compiling details: PyTorch built with:
2023-09-02 15:30:51,400 - mmrotate - INFO - Distributed training: False 2023-09-02 15:30:51,472 - mmrotate - INFO - Config: dataset_type = 'SODAADataset' data_root = '/home/cpl/dataset/SODA-A/' img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(type='RResize', img_scale=(1200, 1200)), dict( type='RRandomFlip', flip_ratio=[0.25, 0.25, 0.25], direction=['horizontal', 'vertical', 'diagonal'], version='le135'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) ] test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1200, 1200), flip=False, transforms=[ dict(type='RResize'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img']) ]) ] data = dict( samples_per_gpu=2, workers_per_gpu=2, train=dict( type='SODAADataset', ann_file='/home/cpl/dataset/SODA-A/train/Annotations/', img_prefix='/home/cpl/dataset/SODA-A/train/Images/', pipeline=[ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(type='RResize', img_scale=(1200, 1200)), dict( type='RRandomFlip', flip_ratio=[0.25, 0.25, 0.25], direction=['horizontal', 'vertical', 'diagonal'], version='le135'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) ], ori_ann_file='/home/cpl/dataset/SODA-A/Annotations/train/', version='le135'), val=dict( type='SODAADataset', ann_file='/home/cpl/dataset/SODA-A/val/Annotations/', img_prefix='/home/cpl/dataset/SODA-A/val/Images/', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1200, 1200), flip=False, transforms=[ dict(type='RResize'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img']) ]) ], ori_ann_file='/home/cpl/dataset/SODA-A/Annotations/val/', version='le135'), test=dict( type='SODAADataset', ann_file='/home/cpl/dataset/SODA-A/test/Annotations/', img_prefix='/home/cpl/dataset/SODA-A/test/Images/', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1200, 1200), flip=False, transforms=[ dict(type='RResize'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img']) ]) ], ori_ann_file='/home/cpl/dataset/SODA-A/Annotations/test/', version='le135')) evaluation = dict(interval=12, metric='mAP') optimizer = dict(type='SGD', lr=0.0025, momentum=0.9, weight_decay=0.0001) optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) lr_config = dict( policy='step', warmup='linear', warmup_iters=500, warmup_ratio=0.3333333333333333, step=[8, 11]) runner = dict(type='EpochBasedRunner', max_epochs=12) checkpoint_config = dict(interval=12) log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')]) dist_params = dict(backend='nccl') log_level = 'INFO' load_from = None resume_from = None workflow = [('train', 1)] opencv_num_threads = 0 mp_start_method = 'fork' angle_version = 'le135'
The enviroment configuration seems no problem. Could you provide the model config? And is this issue occurring during testing for all models, or is it specific to certain models during testing?
The model config
The model config is configs/sodaa-benchmarks/rotated_retinanet_obb_r50_fpn_1x.py.Because I haven't trained on any other models yet, I had a stuck problem testing only on this profile. If this situation is accidental, I will follow up with more model training.
Alright, the data and code issues can be ruled out. In the future, we will update the evaluation code to improve the execution speed and robustness, which may address the problem you encountered.
What's the feature?
选取10张原始数据作为训练集,6张原始数据作为验证集,经过裁剪后trian数据为162张,val数据为219张 在训练完一个epoch后会卡着不动 mmrotate 0.3.4 torch 1.9.1 CUDA 11.1 训练GPU RTX3090 24G
Any other context?
No response
You can try to set nproc=0 at line 412 in sodaa.py:
merged_results = self.merge_det(results, nproc=0)
It won't use multiprocessing module, and works for me.
What's the feature?
选取10张原始数据作为训练集,6张原始数据作为验证集,经过裁剪后trian数据为162张,val数据为219张 在训练完一个epoch后会卡着不动 mmrotate 0.3.4 torch 1.9.1 CUDA 11.1 训练GPU RTX3090 24G
Any other context?
No response