Closed Yoooss closed 2 years ago
Checklist
1.When I want to reproduce the byol model on the downstream task VOC0712 detection, I run the dist_train.sh, however the error raised that mmcv==1.4.2 is used but incompatible.Please install mmcv>=1.3.8,<=1.4.0 2.Then I run "pip install mmcv==1.4.0" however the new error raised that mmcv==1.4.0 is used but incompatible.Please install mmcv >=1.4.2, <=1.6.0
I have searched for the solution, but got no help.
Sorry for the inconvenience, as for downstream task VOC0712 detection, did you mean that you use mmdet to run this task?
Checklist 1.When I want to reproduce the byol model on the downstream task VOC0712 detection, I run the dist_train.sh, however the error raised that mmcv==1.4.2 is used but incompatible.Please install mmcv>=1.3.8,<=1.4.0 2.Then I run "pip install mmcv==1.4.0" however the new error raised that mmcv==1.4.0 is used but incompatible.Please install mmcv >=1.4.2, <=1.6.0 I have searched for the solution, but got no help.
Sorry for the inconvenience, as for downstream task VOC0712 detection, did you mean that you use mmdet to run this task?
yes, according to the tutorials 6 benchmarks.md :
# 分布式版本
bash tools/benchmarks/mmdetection/mim_dist_train.sh ${CONFIG} ${PRETRAIN} ${GPUS}
# slurm 版本
bash tools/benchmarks/mmdetection/mim_slurm_train.sh ${PARTITION} ${CONFIG} ${PRETRAIN}
so I try to run the downstream task VOC0712 detection, but I don't know how.
You need to update your mmdet version and try it again. As for mmcv 1.4.2, mmdet need 2.19.0.
after I update mmdet 2.19.0, it raised new error.
Traceback (most recent call last):
File "/home/ls/mmselfsup/tools/test.py", line 144, in
And this is the version I have installed.
mmcls 0.22.1 pypi_0 pypi
mmcv-full 1.4.2 pypi_0 pypi
mmdet 2.19.0 pypi_0 pypi
mmsegmentation 0.20.2 pypi_0 pypi
mmselfsup 0.8.0 dev_0
Could you also provide your command?
I have tried with this environment: and the config 'configs/benchmarks/mmdetection/coco/mask_rcnn_r50_fpn_mstrain_1x_coco.py', it can start to train the model.
I set the environment as yours. And run the command
python /home/ls/mmselfsup/mmselfsup/.mim/tools/test.py configs/benchmarks/mmdetection/voc0712/faster_rcnn_r50_c4_mstrain_24k_voc0712.py checkpoints/byol_resnet50_8xb32-accum16-coslr-200e_in1k_20220225-5c8b2c2e.pth
May I ask what command you use to train the downstream task?
And here is the environment I use now.
mmcls 0.22.0 pypi_0 pypi
mmcv-full 1.4.4 pypi_0 pypi
mmdet 2.23.0 pypi_0 pypi
mmsegmentation 0.23.0 pypi_0 pypi
mmselfsup 0.8.0 dev_0
Here is my command: SRUN_ARGS="--quotatype=auto" sh tools/benchmarks/mmdetection/mim_slurm_train_c4.sh mm_model configs/benchmarks/mmdetection/voc0712/faster_rcnn_r50_c4_mstrain_24k_voc0712.py https://download.openmmlab.com/mmselfsup/byol/byol_resnet50_8xb32-accum16-coslr-200e_in1k_20220225-5c8b2c2e.pth
Actually, you are supposed to use '.sh' file in this folder: https://github.com/open-mmlab/mmselfsup/tree/master/tools/benchmarks/mmdetection, because in these files we use mim train mmdet
to start a downstream training task, which will execute the scripts from mmdet. If you didn't use slurm, you can try 'mim_dist_train' files.
Besides, the model downloaded from our link is just a backbone, which can not be used in test, it need fine-tuning in some downstream tasks.
I also tried '.sh' file with this command. And I only got 1 GPU .I run the command below.
bash tools/benchmarks/mmdetection/mim_dist_train_c4.sh configs/benchmarks/mmdetection/voc0712/faster_rcnn_r50_c4_mstrain_24k_voc0712.py checkpoints/byol_resnet50_8xb32-accum16-coslr-200e_in1k_20220225-5c8b2c2e.pth 1
I also tried '.sh' file with this command. And I only got 1 GPU .I run the command below.
bash tools/benchmarks/mmdetection/mim_dist_train_c4.sh configs/benchmarks/mmdetection/voc0712/faster_rcnn_r50_c4_mstrain_24k_voc0712.py checkpoints/byol_resnet50_8xb32-accum16-coslr-200e_in1k_20220225-5c8b2c2e.pth 1
Does it work or have the same error? Could you provide the screen shot if it doesn't work. Besides, I tried this command, it works.
It works. Just because my GPU has limited memory, so it raised that CUDA out of memory. Thanks for tour help.Besides, I want to ask that if one GPU couldn't train the model like BYOL、MoCo ,as the code said that
assert cfg.model.type not in [
'DeepCluster', 'MoCo', 'SimCLR', 'ODC', 'NPID', 'SimSiam',
'DenseCL', 'BYOL'
], f'{cfg.model.type} does not support non-dist training.'
It works. Just because my GPU has limited memory, so it raised that CUDA out of memory. Thanks for tour help.Besides, I want to ask that if one GPU couldn't train the model like BYOL、MoCo ,as the code said that
assert cfg.model.type not in [ 'DeepCluster', 'MoCo', 'SimCLR', 'ODC', 'NPID', 'SimSiam', 'DenseCL', 'BYOL' ], f'{cfg.model.type} does not support non-dist training.'
As we follow the official implement of the algorithm, which contains some distributed methods, for example: https://github.com/open-mmlab/mmselfsup/blob/399b5a0d6e638328c8622e4696b09a5a30b8b8dc/mmselfsup/models/algorithms/moco.py#L129 , https://github.com/open-mmlab/mmselfsup/blob/399b5a0d6e638328c8622e4696b09a5a30b8b8dc/configs/selfsup/_base_/models/byol.py#L10, etc.
However, if the algorithm doesn't include distributed operations, you can modify the config and using normal BN instead of SyncBN, like mocov3. It can run with one GPU, but it is time consuming and might influence the performance.
Besides, self supervised learning requires large amount of computing resources, we don't recommend run them on only one GPU.
I run the command bash tools/benchmarks/mmdetection/mim_dist_train_c4.sh configs/benchmarks/mmdetection/voc0712/faster_rcnn_r50_c4_mstrain_2x_voc0712.py checkpoints/byol_resnet50_8xb32-accum16-coslr-200e_in1k_20220225-5c8b2c2e.pth 1
And the epoch was set to 24. I got the result : I wonder if the low accuracy is due to the limited GPU which is only 1 I got. Or it's due to my wrong command?
If you didn't modify the config, the batch size decreases to 2 from 16 (we use 8 GPUs), it will cause the low accuracy because the BN needs larger batch size.
May I ask where I should modify the config to increase the batch size with only 1 GPU?
What I had modified is
samples_per_gpu
controls batch size
Checklist
1.When I want to reproduce the byol model on the downstream task VOC0712 detection, I run the dist_train.sh, however the error raised that mmcv==1.4.2 is used but incompatible.Please install mmcv>=1.3.8,<=1.4.0 2.Then I run "pip install mmcv==1.4.0" however the new error raised that mmcv==1.4.0 is used but incompatible.Please install mmcv >=1.4.2, <=1.6.0
I have searched for the solution, but got no help.