The above script does batched eval on 500 examples on an A100 node with batch size 8, and takes an hour to run. This is significantly slower than running evals with batch size 1, which runs in around 15 minutes. Do you know why this might be happening (maybe the longest generation in each batch is the bottleneck)? And, is there a way to make the batched evals faster? The model is small, so I want to have some parallelization to use the full available GPU resources. Thanks so much!
Reminder
System Info
llamafactory-version 0.8.3.dev0 python 3.11.9 AWS EC2 instance
Reproduction
Expected behavior
Hi,
The above script does batched eval on 500 examples on an A100 node with batch size 8, and takes an hour to run. This is significantly slower than running evals with batch size 1, which runs in around 15 minutes. Do you know why this might be happening (maybe the longest generation in each batch is the bottleneck)? And, is there a way to make the batched evals faster? The model is small, so I want to have some parallelization to use the full available GPU resources. Thanks so much!
Others
Thanks again for the great work!