../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [40,0,0], thread: [26,0,0] Assertion `srcIndex < srcSelectDimSize` failed.

python -m sglang.launch_server --model-path liuhaotian/llava-v1.6-34b --tokenizer-path liuhaotian/llava-v1.6-34b-tokenizer --port=30020 --host="0.0.0.0" --tp-size=1 --random-seed=1234 --context-length=4096 &> 34b.log &

client:

pload = {'text': '<|im_start|>system\nAnswer the questions.<|im_end|>user<image>\nGive detailed information.<|im_end|>', 'sampling_params': {'max_new_tokens': 1024, 'temperature': 0.0, 'top_p': 1.0, 'presence_penalty': 0.14000000000000012, 'frequency_penalty': 2, 'stop': ['<|im_end|>']}, 'stream': False}

The pload also has the image in bytes form, e.g.:

data:image/png;base64,iVBORw0KGgoAAAANSU...

For this image:

bigben

client code:

        response = requests.post(
            url,
            json=pload,
            stream=False,
        )

stream False or True doesn't help. url is just 'http://xxx.xxx.xxx.xxx:80/generate'.

The conv_chatml_direct was used to construct the above.

I get the same problem if I remove the sampling parameters.

This model works perfectly well on original llava worker-server-gradio setup, but has tons of issues with sglang. This includes no response or total failure on the server. This isn't just random issue, it happens repeatedly always and constantly, rare that things work.

Other models like llama3 work perfectly fine with same code.

The error:

INFO:     172.16.0.20:22926 - "GET /health HTTP/1.1" 200 OK
INFO:     172.16.0.20:18448 - "GET /health HTTP/1.1" 200 OK
INFO:     172.16.0.20:42702 - "GET /health HTTP/1.1" 200 OK
INFO:     172.16.0.20:1276 - "GET /health HTTP/1.1" 200 OK
INFO:     172.16.0.20:26522 - "GET /health HTTP/1.1" 200 OK
INFO:     172.16.0.20:19358 - "GET /health HTTP/1.1" 200 OK
new fill batch. #seq: 1. #cached_token: 9. #new_token: 8. #remaining_req: 0. #running_req: 0. tree_cache_hit_rate: 49.92%.
../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [40,0,0], thread: [0,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [40,0,0], thread: [1,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [40,0,0], thread: [2,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [40,0,0], thread: [3,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [40,0,0], thread: [4,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [40,0,0], thread: [5,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [40,0,0], thread: [6,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [40,0,0], thread: [7,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [40,0,0], thread: [8,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [40,0,0], thread: [9,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [40,0,0], thread: [10,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [40,0,0], thread: [11,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [40,0,0], thread: [12,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [40,0,0], thread: [13,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [40,0,0], thread: [14,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [40,0,0], thread: [15,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [40,0,0], thread: [16,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [40,0,0], thread: [17,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [40,0,0], thread: [18,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [40,0,0], thread: [19,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [40,0,0], thread: [20,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [40,0,0], thread: [21,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [40,0,0], thread: [22,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [40,0,0], thread: [23,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [40,0,0], thread: [24,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [40,0,0], thread: [25,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [40,0,0], thread: [26,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [40,0,0], thread: [27,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [40,0,0], thread: [28,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [40,0,0], thread: [29,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [40,0,0], thread: [30,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [40,0,0], thread: [31,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [19,0,0], thread: [96,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [19,0,0], thread: [97,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
... etc.
Exception in ModelRpcClient:
Traceback (most recent call last):
  File "/home/ubuntu/sglang/python/sglang/srt/managers/router/model_rpc.py", line 175, in exposed_step
    self.forward_step()
  File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/sglang/python/sglang/srt/managers/router/model_rpc.py", line 190, in forward_step
    self.forward_fill_batch(new_batch)
  File "/home/ubuntu/sglang/python/sglang/srt/managers/router/model_rpc.py", line 418, in forward_fill_batch
    ) = self.model_runner.forward(batch, ForwardMode.EXTEND)
  File "/home/ubuntu/sglang/python/sglang/srt/managers/router/model_runner.py", line 404, in forward
    return self.forward_extend_multi_modal(batch)
  File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/sglang/python/sglang/srt/managers/router/model_runner.py", line 393, in forward_extend_multi_modal
    return self.model.forward(
  File "/home/ubuntu/sglang/python/sglang/srt/models/llava.py", line 106, in forward
    .cpu()
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

etc.  this repeats forever and server is dead.

sgl-project / sglang

../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [40,0,0], thread: [26,0,0] Assertion `srcIndex < srcSelectDimSize` failed. #473