SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable.
Apache License 2.0
2.75k
stars
177
forks
source link
CUDA error: device-side assert triggered in self.forward_extend_multi_modal(batch) #528
I have successfully started up backend for llava-v1.6-34b using similar code shown in https://github.com/sgl-project/sglang/blob/f6dbd24043b8c18d87a14b3c6fe5c4f567f6c1ba/examples/quick_start/srt_example_llava.py Everything is ok when I run single or batch in one process. However, if I use parallel, like simultaneous run 2 batch function, There is a certain probability get error in backend, it doesn't appear every time, the full error message is shown below![image](https://github.com/sgl-project/sglang/assets/167419371/aa010f24-013c-4acb-a0b5-32c33d53412b)
I found some similar error in other issues, but it seems is different from my issue, dose anyone meet the same issue and know how to fix it? Thanks~