Closed nivibilla closed 11 months ago
Im trying to shard a 13B over 4 GPUs so I can have a bigger batch
config = ExLlamaConfig(model_config_path) config.model_path = model_path config.max_seq_len = 1024 + 15 config.auto_map = [4,4,4,4] # This line doesn't work model = ExLlama(config) tokenizer = ExLlamaTokenizer(tokenizer_path) BATCH_SIZE = 16 cache = ExLlamaCache(model, batch_size=BATCH_SIZE) generator = ExLlamaGenerator(model, tokenizer, cache)
Actually ignore this, it works, I just didn't have a large enough batch size to push the model over.
Im trying to shard a 13B over 4 GPUs so I can have a bigger batch