lean-dojo / ReProver

Retrieval-Augmented Theorem Provers for Lean
https://leandojo.org
MIT License
227 stars 53 forks source link

Evaluation on multiple GPUs #73

Open ezhang7423 opened 1 month ago

ezhang7423 commented 1 month ago

Hi there! I am trying to run evaluation with multiple GPUs. I have ran everything in './scripts/minimal_example.sh'. Running on 1 GPU works perfectly, however when I pass in more than 1 GPU, all Huggingface Generator workers still only initialize on the first GPU, leaving the rest unutilized. Unfortunately, --use-vlm does not appear to be working either, as an error is thrown that the T5 model is not supported. Any support would be greatly appreciated.

yangky11 commented 1 month ago

Hi,

I'm not able to reproduce the problem. I tried running python prover/evaluate.py --data-path data/leandojo_benchmark_4/random/ --gen_ckpt_path kaiyuy/leandojo-lean4-tacgen-byt5-small --split test --num-workers 40 --num-gpus 8 and got the GPU utilization below. It seems all GPUs are running the evaluation.

image