Open lucasjinreal opened 1 year ago
No, 7B uses 8600 MB of VRAM
I was able to run 7B with a max_batch_size=1 in 7.12 GiB With a max_batch_size=4, it used 7.87GiB. Decreasing max_seq_len may allow for higher batch size.
@ubik2 does there any inference result, with int8?
@ubik2 does there any inference result, with int8?
I was able to generate text responses based on the example prompt. The quality may not be that great, though.
Does 8GB able to run smallest llama model?