[X] 1. I have searched related issues but cannot get the expected help.
[X] 2. The bug has not been fixed in the latest version.
[X] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
I request to add this error suppression before torch becomes stable.
For example, fixing this bug can make MiniCPM3 single batch decoding throughput rise from 66 token/s to 103 token/s, without affecting the output quality.
Checklist
Describe the bug
MLA models like DeepSeek-V2 and MiniCPM3 cannot use
--enable-torch-compile
.However, simply adding 2 lines of code into sglang can resolve it and improve single batch speed.
I request to add this error suppression before torch becomes stable.
For example, fixing this bug can make MiniCPM3 single batch decoding throughput rise from 66 token/s to 103 token/s, without affecting the output quality.
Reproduction
to reproduce:
python3 -m sglang.bench_latency --model openbmb/MiniCPM3-4B--trust-remote-code --input-len 1024 --output-len 1024 --batch 1 --enable-torch-compile
Environment