How to run 30b plus model with lighteval when accelerate launch failed? OOM

huggingface / lighteval

LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron.

MIT License

467 stars 54 forks source link

How to run 30b plus model with lighteval when accelerate launch failed? OOM #155

Closed xiechengmude closed 2 months ago

xiechengmude commented 2 months ago

CUDA Memory OOM when I launch an evaluation for 30b model using lighteval.

Whats the correct config for it?

clefourrier commented 2 months ago

Hi! Could you specify the precision and hardware you are using?

xiechengmude commented 2 months ago

8A100 for 70b model.

Is there any examples for evaluating a max model for open_llm_leaderboard_tasks in a fast way?

clefourrier commented 2 months ago

Hi!

In which precision are you running the 70B model?
Are your A100 80G or 40G?

It's not going to need the same model parallelism/data parallelism ratio depending on it.

clefourrier commented 2 months ago

Hi, closing this issue for inactivity, but feel free to re-open if you need help.