turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.52k stars 271 forks source link

v0.1.1 multi-gpu issue (fine in v0.0.21) #483

Closed surenchl closed 3 months ago

surenchl commented 3 months ago

v0.1.1 doesn't go beyond the first GPU's VRAM (with an out-of-memory error) while v0.0.21 works fine utilizing all the GPUs.

turboderp commented 3 months ago

Could you elaborate a little? What GPUs, operating system etc., what software are you using and what settings? And what is the specific exception you're getting?