lmstudio-ai / lmstudio-bug-tracker

Bug tracking for the LM Studio desktop application
10 stars 3 forks source link

Bug/Feature Request: GPU Memory Management for Dual-GPU Systems #119

Open SeriousPaul1270 opened 2 months ago

SeriousPaul1270 commented 2 months ago

I've got a bit of an issue here. I'm running two GPUs - one Nvidia with 8GB of VRAM and an AMD card with 4GB of VRAM. When loading up a model, I'm limited to the total of 8GB of VRAM, which isn't too bad. However, if there's any load on the AMD card, the model just won't load.

I've also noticed that when the model is loaded onto the Nvidia card (which has more than enough memory to handle it), about 3.5GB ends up on the AMD card and the rest stays on the Nvidia card. This seems like a waste of resources to me - why not just let the model use all the VRAM available on the Nvidia card?

I'm wondering if this is a bug or if there's some specific reason for how OpenWebUI handles GPU memory. If it's intentional, maybe we could revisit the memory allocation strategy? I'd love to get your thoughts on this.

Thanks in advance!

TashaSkyUp commented 1 month ago

yes please add the ability to control which GPU the model is loaded to!!!