setting mem-fraction-static to a lower value causes error

Jacsarge commented 9 months ago

With no change, I run out of memory (A100 w/ 24GB). Setting it to anything other than the default causes the following error:

Exception in ModelRpcClient:
Traceback (most recent call last):
  File "/workspace/sglang/python/sglang/srt/managers/router/model_rpc.py", line 170, in exposed_step
    self.forward_step()
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/workspace/sglang/python/sglang/srt/managers/router/model_rpc.py", line 185, in forward_step
    self.forward_fill_batch(new_batch)
  File "/workspace/sglang/python/sglang/srt/managers/router/model_rpc.py", line 387, in forward_fill_batch
    batch.prepare_for_extend(
  File "/workspace/sglang/python/sglang/srt/managers/router/infer_batch.py", line 203, in prepare_for_extend
    req_pool_indices_cpu = req_pool_indices.cpu().numpy()
AttributeError: 'NoneType' object has no attribute 'cpu'

For reference I am attempting to use gen with a very large set of items to select through to limit inference tokens (thousands)

Jacsarge commented 9 months ago

I am trying to use a quantized mistral 7b model that is 3.9G large.

merrymercy commented 9 months ago

Why do you need to use a smaller value?

Iven2132 commented 6 months ago

Why do you need to use a smaller value?

Hi, @merrymercy I should increase or decrease --mem-fraction-static? I'm running the Qwen 72b model on 4 * A100s 80GB. I'm getting this error "RuntimeError: Not enought memory. Please try to increase --mem-fraction-static."

github-actions[bot] commented 3 months ago

This issue has been automatically closed due to inactivity. Please feel free to reopen it if needed.

sgl-project / sglang

setting mem-fraction-static to a lower value causes error #165