turboderp / exllama

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
MIT License
2.67k stars 214 forks source link

Can't assign model to multi gpu #205

Closed nivibilla closed 11 months ago

nivibilla commented 11 months ago

Im trying to shard a 13B over 4 GPUs so I can have a bigger batch

config = ExLlamaConfig(model_config_path)          
config.model_path = model_path                         
config.max_seq_len = 1024 + 15                        
config.auto_map = [4,4,4,4]                             # This line doesn't work

model = ExLlama(config)                        
tokenizer = ExLlamaTokenizer(tokenizer_path)            

BATCH_SIZE = 16

cache = ExLlamaCache(model, batch_size=BATCH_SIZE)     
generator = ExLlamaGenerator(model, tokenizer, cache)  
nivibilla commented 11 months ago

Actually ignore this, it works, I just didn't have a large enough batch size to push the model over.