Meta-Llama-3-70B-Instruct running out of memory on 8 A100-40GB

whatdhack commented 1 month ago

Describe the bug

Out of memory. Tried to allocate X.XX GiB .....

I guess any A100 system with 8+ GPUs

python example_chat_completion.py

Out of memory. Tried to allocate X.XX GiB  .....

Additional context Is there a way to reduce the memory requirement ? Most obvious trick, reducing batch size, did not prevent OOM.

whatdhack commented 3 weeks ago

What is the best way to adapt the 8 checkpoints for A100-80GB/H100 for the 70B model to say 16 A100-40GB ?

subramen commented 2 weeks ago