Closed pseudotensor closed 2 months ago
Maybe you can change your speculative model or set the spec_decoding_acceptance_method
to typical_acceptance_sampler
. When using '[ngram]', a bug exists in the RejectionSampler
source code. It can not handle draft_probs
with the shape (0, k).
Is anyone fixing this bug? cc @cadedaniel
I'm happy to try other options. It was working well for someone else, but not for me on the phi-3-mini-128k model. Failed instantly. I'll probably wait until this bug is fixed before trying again.
The hope is that for structured output, others are getting quite good speed-up. i.e. for guided_json and JSON output, about 5x improvement for a 7b model. Sounds great, but just crashes for me.
I'm happy to try other options. It was working well for someone else, but not for me on the phi-3-mini-128k model. Failed instantly. I'll probably wait until this bug is fixed before trying again.
The hope is that for structured output, others are getting quite good speed-up. i.e. for guided_json and JSON output, about 5x improvement for a 7b model. Sounds great, but just crashes for me.
Did you try adding --spec-decoding-acceptance-method='typical_acceptance_sampler' \
? It works for me to avoid the crash.
I'm happy to try other options. It was working well for someone else, but not for me on the phi-3-mini-128k model. Failed instantly. I'll probably wait until this bug is fixed before trying again.
The hope is that for structured output, others are getting quite good speed-up. i.e. for guided_json and JSON output, about 5x improvement for a 7b model. Sounds great, but just crashes for me.
FYI, you can build from the source code of the main branch. I guess the container you are using is built with vllm version v0.5.3 or v0.5.3.post1. (#6698) has fixed this bug. Alternatively, you can wait for the release of v0.5.4, which should not cause the crash again.
0.5.4 seems to fix the issue.
Your current environment
🐛 Describe the bug
What very first message to the model of "Who are you?" I got "I" and then died.