sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.
https://sglang.readthedocs.io/en/latest/
Apache License 2.0
5.36k stars 388 forks source link

MoE model (BDRX/Mixtral) NaN when using flashinfer #547

Closed Ying1123 closed 2 months ago

Ying1123 commented 3 months ago

Reproduce by

python -m sglang.launch_server --model-path databricks/dbrx-instruct --tp 8 --port 30000 --mem-frac 0.8 --enable-flashinfer

and

python3 bench_sglang.py --num-questions 10
yzh119 commented 3 months ago

dbrx uses a gqa group size of 6, it should have been supported in https://github.com/flashinfer-ai/flashinfer/pull/301 (and release v0.0.5)