Closed zhouchang123 closed 3 days ago
I use gpustat -i 1
to watch the gpu state .
When it went wrong ,the gpu 2 and 3 were not on the limit ,only 0 went out of memory.(My friend is using 1)
How to solve it?
Try reducing the inference batch_size to 4 or even lower:
Yes,it works. But it still shows that the use memory on gpu 0 is double of gpu 2
This is okay, sometimes the model can load unevenly across different GPUs.
When run the code:bash inference.sh,it occurs an error.
![image](https://github.com/microsoft/LMOps/assets/61148892/b2ab0790-2844-48a6-8ab9-7740cf962df0)