Closed UncleFB closed 5 months ago
Yes, you can split a model across multiple GPUs easily. The inference example does this by default, automatically splitting across multiple devices if necessary. For scripts using model_init.py, the command-line arguments are -gs x,y,z
to use x
GB of VRAM on the first GPU, y
on the second and so on, or -gs auto
to split automatically.
I download a model named Yi-34Bx2-MoE-60B-4.0bpw-h6-exl2, but my gpu memory is not enough. Can I use multi gpus to load my model to inference.