microsoft / TAP

TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)
MIT License
70 stars 11 forks source link

Error in running pretrain because of torch.distributed #26

Open tinaboya2023 opened 1 year ago

tinaboya2023 commented 1 year ago

Hi, I install environment with below information python=3.8 pytorch,cuda with command=conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia GPU= 1 geforce RTX 3090 (24 GPU-RAM)

I'm trying to run pretrain with below command python -m torch.distributed.launch --nproc_per_node 1 tools/run.py --pretrain --tasks vqa --datasets m4c_textvqa --model m4c_split --seed 13 --config configs/vqa/m4c_textvqa/tap_base_pretrain.yml --save_dir save/m4c_split_pretrain_test training_parameters.distributed True

but I encounter below code 3

Could you help me to resolve this problem? Is this error because of using 1 GPU? Do I need to change the initial value of a some parameter(like local_rank)? Could the reason for this error be due to lack of GPU-memory? It is very important to me to solve this problem.