Closed Ying1123 closed 2 months ago
Reproduce by
python -m sglang.launch_server --model-path databricks/dbrx-instruct --tp 8 --port 30000 --mem-frac 0.8 --enable-flashinfer
and
python3 bench_sglang.py --num-questions 10
dbrx uses a gqa group size of 6, it should have been supported in https://github.com/flashinfer-ai/flashinfer/pull/301 (and release v0.0.5)
Reproduce by
and