open-mmlab / mmselfsup

OpenMMLab Self-Supervised Learning Toolbox and Benchmark
https://mmselfsup.readthedocs.io/en/latest/
Apache License 2.0
3.2k stars 432 forks source link

mmcv version not compatible #315

Closed Yoooss closed 2 years ago

Yoooss commented 2 years ago

Checklist

1.When I want to reproduce the byol model on the downstream task VOC0712 detection, I run the dist_train.sh, however the error raised that mmcv==1.4.2 is used but incompatible.Please install mmcv>=1.3.8,<=1.4.0 2.Then I run "pip install mmcv==1.4.0" however the new error raised that mmcv==1.4.0 is used but incompatible.Please install mmcv >=1.4.2, <=1.6.0

I have searched for the solution, but got no help.

fangyixiao18 commented 2 years ago

Checklist

1.When I want to reproduce the byol model on the downstream task VOC0712 detection, I run the dist_train.sh, however the error raised that mmcv==1.4.2 is used but incompatible.Please install mmcv>=1.3.8,<=1.4.0 2.Then I run "pip install mmcv==1.4.0" however the new error raised that mmcv==1.4.0 is used but incompatible.Please install mmcv >=1.4.2, <=1.6.0

I have searched for the solution, but got no help.

Sorry for the inconvenience, as for downstream task VOC0712 detection, did you mean that you use mmdet to run this task?

Yoooss commented 2 years ago

Checklist 1.When I want to reproduce the byol model on the downstream task VOC0712 detection, I run the dist_train.sh, however the error raised that mmcv==1.4.2 is used but incompatible.Please install mmcv>=1.3.8,<=1.4.0 2.Then I run "pip install mmcv==1.4.0" however the new error raised that mmcv==1.4.0 is used but incompatible.Please install mmcv >=1.4.2, <=1.6.0 I have searched for the solution, but got no help.

Sorry for the inconvenience, as for downstream task VOC0712 detection, did you mean that you use mmdet to run this task?

yes, according to the tutorials 6 benchmarks.md :

检测

# 分布式版本
bash tools/benchmarks/mmdetection/mim_dist_train.sh ${CONFIG} ${PRETRAIN} ${GPUS}

# slurm 版本
bash tools/benchmarks/mmdetection/mim_slurm_train.sh ${PARTITION} ${CONFIG} ${PRETRAIN}

so I try to run the downstream task VOC0712 detection, but I don't know how.

fangyixiao18 commented 2 years ago

You need to update your mmdet version and try it again. As for mmcv 1.4.2, mmdet need 2.19.0. image

Yoooss commented 2 years ago

after I update mmdet 2.19.0, it raised new error.

Traceback (most recent call last): File "/home/ls/mmselfsup/tools/test.py", line 144, in main() File "/home/ls/mmselfsup/tools/test.py", line 66, in main cfg = mmcv.Config.fromfile(args.config) File "/home/ls/anaconda3/envs/mmselfsup/lib/python3.7/site-packages/mmcv/utils/config.py", line 337, in fromfile import_modules_from_strings(**cfg_dict['custom_imports']) File "/home/ls/anaconda3/envs/mmselfsup/lib/python3.7/site-packages/mmcv/utils/misc.py", line 80, in import_modules_from_strings raise ImportError ImportError

Yoooss commented 2 years ago

And this is the version I have installed.

mmcls 0.22.1 pypi_0 pypi mmcv-full 1.4.2 pypi_0 pypi mmdet 2.19.0 pypi_0 pypi mmsegmentation 0.20.2 pypi_0 pypi mmselfsup 0.8.0 dev_0

fangyixiao18 commented 2 years ago

Could you also provide your command?

I have tried with this environment: middle_img_v2_59a582c1-9954-4b89-bd10-45b4c7bb81eg and the config 'configs/benchmarks/mmdetection/coco/mask_rcnn_r50_fpn_mstrain_1x_coco.py', it can start to train the model.

Model link: https://download.openmmlab.com/mmselfsup/byol/byol_resnet50_8xb32-accum16-coslr-200e_in1k_20220225-5c8b2c2e.pth

Yoooss commented 2 years ago

I set the environment as yours. And run the command

python /home/ls/mmselfsup/mmselfsup/.mim/tools/test.py configs/benchmarks/mmdetection/voc0712/faster_rcnn_r50_c4_mstrain_24k_voc0712.py checkpoints/byol_resnet50_8xb32-accum16-coslr-200e_in1k_20220225-5c8b2c2e.pth

2022-05-24 15-30-49 的屏幕截图

May I ask what command you use to train the downstream task?

Yoooss commented 2 years ago

And here is the environment I use now. mmcls 0.22.0 pypi_0 pypi mmcv-full 1.4.4 pypi_0 pypi mmdet 2.23.0 pypi_0 pypi mmsegmentation 0.23.0 pypi_0 pypi mmselfsup 0.8.0 dev_0

fangyixiao18 commented 2 years ago

Here is my command: SRUN_ARGS="--quotatype=auto" sh tools/benchmarks/mmdetection/mim_slurm_train_c4.sh mm_model configs/benchmarks/mmdetection/voc0712/faster_rcnn_r50_c4_mstrain_24k_voc0712.py https://download.openmmlab.com/mmselfsup/byol/byol_resnet50_8xb32-accum16-coslr-200e_in1k_20220225-5c8b2c2e.pth

Actually, you are supposed to use '.sh' file in this folder: https://github.com/open-mmlab/mmselfsup/tree/master/tools/benchmarks/mmdetection, because in these files we use mim train mmdet to start a downstream training task, which will execute the scripts from mmdet. If you didn't use slurm, you can try 'mim_dist_train' files.

Besides, the model downloaded from our link is just a backbone, which can not be used in test, it need fine-tuning in some downstream tasks.

Yoooss commented 2 years ago

I also tried '.sh' file with this command. And I only got 1 GPU .I run the command below.

bash tools/benchmarks/mmdetection/mim_dist_train_c4.sh configs/benchmarks/mmdetection/voc0712/faster_rcnn_r50_c4_mstrain_24k_voc0712.py checkpoints/byol_resnet50_8xb32-accum16-coslr-200e_in1k_20220225-5c8b2c2e.pth 1

fangyixiao18 commented 2 years ago

I also tried '.sh' file with this command. And I only got 1 GPU .I run the command below.

bash tools/benchmarks/mmdetection/mim_dist_train_c4.sh configs/benchmarks/mmdetection/voc0712/faster_rcnn_r50_c4_mstrain_24k_voc0712.py checkpoints/byol_resnet50_8xb32-accum16-coslr-200e_in1k_20220225-5c8b2c2e.pth 1

Does it work or have the same error? Could you provide the screen shot if it doesn't work. Besides, I tried this command, it works.

Yoooss commented 2 years ago

It works. Just because my GPU has limited memory, so it raised that CUDA out of memory. Thanks for tour help.Besides, I want to ask that if one GPU couldn't train the model like BYOL、MoCo ,as the code said that

    assert cfg.model.type not in [
        'DeepCluster', 'MoCo', 'SimCLR', 'ODC', 'NPID', 'SimSiam',
        'DenseCL', 'BYOL'
    ], f'{cfg.model.type} does not support non-dist training.'
fangyixiao18 commented 2 years ago

It works. Just because my GPU has limited memory, so it raised that CUDA out of memory. Thanks for tour help.Besides, I want to ask that if one GPU couldn't train the model like BYOL、MoCo ,as the code said that

    assert cfg.model.type not in [
        'DeepCluster', 'MoCo', 'SimCLR', 'ODC', 'NPID', 'SimSiam',
        'DenseCL', 'BYOL'
    ], f'{cfg.model.type} does not support non-dist training.'

As we follow the official implement of the algorithm, which contains some distributed methods, for example: https://github.com/open-mmlab/mmselfsup/blob/399b5a0d6e638328c8622e4696b09a5a30b8b8dc/mmselfsup/models/algorithms/moco.py#L129 , https://github.com/open-mmlab/mmselfsup/blob/399b5a0d6e638328c8622e4696b09a5a30b8b8dc/configs/selfsup/_base_/models/byol.py#L10, etc.

However, if the algorithm doesn't include distributed operations, you can modify the config and using normal BN instead of SyncBN, like mocov3. It can run with one GPU, but it is time consuming and might influence the performance.

Besides, self supervised learning requires large amount of computing resources, we don't recommend run them on only one GPU.

Yoooss commented 2 years ago

I run the command bash tools/benchmarks/mmdetection/mim_dist_train_c4.sh configs/benchmarks/mmdetection/voc0712/faster_rcnn_r50_c4_mstrain_2x_voc0712.py checkpoints/byol_resnet50_8xb32-accum16-coslr-200e_in1k_20220225-5c8b2c2e.pth 1

And the epoch was set to 24. I got the result : 2022-05-30 09-22-54 的屏幕截图 I wonder if the low accuracy is due to the limited GPU which is only 1 I got. Or it's due to my wrong command?

fangyixiao18 commented 2 years ago

If you didn't modify the config, the batch size decreases to 2 from 16 (we use 8 GPUs), it will cause the low accuracy because the BN needs larger batch size.

Yoooss commented 2 years ago

May I ask where I should modify the config to increase the batch size with only 1 GPU?

Yoooss commented 2 years ago

What I had modified is 2022-05-31 16-51-55 的屏幕截图

fangyixiao18 commented 2 years ago

samples_per_gpu controls batch size