juncongmoo / pyllama

LLaMA: Open and Efficient Foundation Language Models
GNU General Public License v3.0
2.81k stars 311 forks source link

Any way to infer a quantized model on multi GPUs? #61

Open Imagium719 opened 1 year ago

cheebeez commented 1 year ago

Same question...