Open whatdhack opened 1 month ago
Out of memory. Tried to allocate X.XX GiB .....
I guess any A100 system with 8+ GPUs
python example_chat_completion.py
<Remember to wrap the output in ```triple-quotes blocks```>
```triple-quotes blocks```
Additional context Is there a way to reduce the memory requirement ? Most obvious trick, reducing batch size, did not prevent OOM.
What is the best way to adapt the 8 checkpoints for A100-80GB/H100 for the 70B model to say 16 A100-40GB ?
Please see this thread: https://github.com/meta-llama/llama3/issues/157#issuecomment-2110497041
Describe the bug
Out of memory. Tried to allocate X.XX GiB .....
Minimal reproducible example
I guess any A100 system with 8+ GPUs
Output
<Remember to wrap the output in
```triple-quotes blocks```
>Runtime Environment
Additional context Is there a way to reduce the memory requirement ? Most obvious trick, reducing batch size, did not prevent OOM.