Closed marcobellagente93 closed 1 month ago
Thanks for reporting! We normally have a pad and gather to prevent this (which pads too short splits before gathering if needed). We'll investigate what happens here, as it should apply.
Hi! I can't reproduce, can you try to reinstall from main and tell me if you're still getting an error? It would also be nice to provide your detailed accelerate config
Closing for inactivity
I've setup accelerate and running multi-gpus with
accelerate launch run_evals_accelerate.py --tasks="leaderboard|mmlu:abstract_algebra|0|0" --output_dir "/weka/home-marcob/lighteval/scores" --model_args "pretrained=gpt2"
I get:
Note that with the same identical setup I can run if I use 5-shots, i.e. the following doesn't throw an error
accelerate launch run_evals_accelerate.py --tasks="leaderboard|mmlu:abstract_algebra|5|0" --output_dir "/weka/home-marcob/lighteval/scores" --model_args "pretrained=gpt2"
The same error happens systematically when running the custom MMLU eval from finewebedu (https://huggingface.co/datasets/HuggingFaceFW/fineweb/blob/main/lighteval_tasks.py#L12).
Also note that without configuring accelerate to explicitly verify tensor shapes no error is thrown and the process just hangs indefinitely