Closed Jacsarge closed 3 months ago
I am trying to use a quantized mistral 7b model that is 3.9G large.
Why do you need to use a smaller value?
Why do you need to use a smaller value?
Hi, @merrymercy I should increase or decrease --mem-fraction-static? I'm running the Qwen 72b model on 4 * A100s 80GB. I'm getting this error "RuntimeError: Not enought memory. Please try to increase --mem-fraction-static."
This issue has been automatically closed due to inactivity. Please feel free to reopen it if needed.
With no change, I run out of memory (A100 w/ 24GB). Setting it to anything other than the default causes the following error:
For reference I am attempting to use gen with a very large set of items to select through to limit inference tokens (thousands)