Closed matsutaku44 closed 4 months ago
Changing "python -m torch.distributed.launch --nproc_per_node=2 run_beit3_finetuning.py" to "python -m run_beit3_finetuning" solved it for me in google colab.
@Sv3n01 Thank you for replying! I am trying this change now.
I removed "torch.distributed.launch --nproc_per_node=2" and run again. Then, Evaluation seemed to be started. Thank you very much! However, a different error happens.
I run this code.
python run_beit3_finetuning.py \
--model beit3_base_patch16_480 \
--input_size 480 \
--task vqav2 \
--batch_size 16 \
--sentencepiece_model ../../../../new_mensa/data/VQAv2/BEIT3/beit3.spm \
--finetune ../../../../new_mensa/data/VQAv2/BEIT3/beit3_base_indomain_patch16_224.pth \
--data_path ../../../../new_mensa/data/VQAv2 \
--output_dir ./prediction_saveHere \
--eval \
--dist_eval
The error
. . .
Test: [18640/18659] eta: 0:00:05 time: 0.2775 data: 0.0002 max mem: 3774
Test: [18650/18659] eta: 0:00:02 time: 0.2774 data: 0.0002 max mem: 3774
Test: [18658/18659] eta: 0:00:00 time: 0.2658 data: 0.0000 max mem: 3774
Test: Total time: 1:26:48 (0.2792 s / it)
Traceback (most recent call last):
File "run_beit3_finetuning.py", line 448, in <module>
main(opts, ds_init)
File "run_beit3_finetuning.py", line 365, in main
utils.dump_predictions(args, result, "vqav2_test")
File "/home/matsuzaki.takumi/workspace/vqa/unilm/beit3/utils.py", line 845, in dump_predictions
torch.distributed.barrier()
File "/home/matsuzaki.takumi/.conda/envs/beit3-3.8/lib/python3.8/site-packages/torch/distributed/c10d_logger.py", line 75, in wrapper
return func(*args, **kwargs)
File "/home/matsuzaki.takumi/.conda/envs/beit3-3.8/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 3672, in barrier
opts.device = _get_pg_default_device(group)
File "/home/matsuzaki.takumi/.conda/envs/beit3-3.8/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 649, in _get_pg_default_device
group = group or _get_default_group()
File "/home/matsuzaki.takumi/.conda/envs/beit3-3.8/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 1008, in _get_default_group
raise ValueError(
ValueError: Default process group has not been initialized, please make sure to call init_process_group.
I am trying to solve this problem now. Do you have a solution? Please teach me.
I can get submit_vqav2_test.json (the list of pairs of question_id and answer).
I write this in run_beit3_finetuning.py (line 141)
parser.add_argument("--local-rank", type=int)
Then, I run this code. (Maybe you should not omit "-m torch.distributed.launch --nproc_per_node=2")
python -m torch.distributed.launch --nproc_per_node=2 run_beit3_finetuning.py \
--model beit3_base_patch16_480 \
--input_size 480 \
--task vqav2 \
--batch_size 16 \
--sentencepiece_model ../../../../new_mensa/data/VQAv2/BEIT3/beit3.spm \
--finetune ../../../../new_mensa/data/VQAv2/BEIT3/beit3_base_indomain_patch16_224.pth \
--data_path ../../../../new_mensa/data/VQAv2 \
--output_dir ./prediction_saveHere \
--eval \
--dist_eval
Then, I can get submit_vqav2_test.json
. . .
Test: [9310/9330] eta: 0:00:05 time: 0.2790 data: 0.0002 max mem: 4665
Test: [9320/9330] eta: 0:00:02 time: 0.2789 data: 0.0002 max mem: 4665
Test: [9329/9330] eta: 0:00:00 time: 0.2674 data: 0.0001 max mem: 4665
Test: Total time: 0:43:23 (0.2790 s / it)
Infer 447793 examples into ./prediction_saveHere/submit_vqav2_test.json
I don't know why I can get the json file. But, I close this issue.
Describe Model I am using (UniLM, MiniLM, LayoutLM ...): BEIT-3
I want to evaluate BEiT-3 finetuned model on VQAv2. https://github.com/microsoft/unilm/blob/master/beit3/get_started/get_started_for_vqav2.md#example-evaluate-beit-3-finetuned-model-on-vqav2-visual-question-answering
However, error happens. I cannot understand what this error message means. How do I solve this problem? Please help me. Thank you for sharing codes of BEIT-3.