microsoft / LMOps

General technology for enabling AI capabilities w/ LLMs and MLLMs
https://aka.ms/GeneralAI
MIT License
3.39k stars 253 forks source link

[UPRISE]CUDA out of memory. Tried to allocate 3.25 GiB. GPU #237

Closed zhouchang123 closed 3 days ago

zhouchang123 commented 3 days ago

When run the code:bash inference.sh,it occurs an error. image image

zhouchang123 commented 3 days ago

I use gpustat -i 1 to watch the gpu state . When it went wrong ,the gpu 2 and 3 were not on the limit ,only 0 went out of memory.(My friend is using 1) How to solve it? image

cdxeve commented 3 days ago
  1. Try reducing the inference batch_size to 4 or even lower:

    image
    1. add CUDA_VISIBLE_DEVICES='0,2' if necessary.
zhouchang123 commented 3 days ago

Yes,it works. But it still shows that the use memory on gpu 0 is double of gpu 2

cdxeve commented 3 days ago

This is okay, sometimes the model can load unevenly across different GPUs.