Open mitchellstern opened 3 months ago
Seeing the same error! Were you able to get around this @mitchellstern ?
Seeing the same error! Were you able to get around this @mitchellstern ?
I'm afraid not, sorry.
Unfortunately vLLM speculative decoding does not yet support LoRA inference.
@cadedaniel Is this not supported for Medusa adapter based speculative decoding as well? Any plans to add support for this? I am also happy to take up work to add this support!
Can you share your use case? We'd love to see this supported but no bandwidth from me to take it on.
Happy to chat more about this! Could we hop on a zoom call to discuss this further? I should have bandwidth to take this on.
Send me an email at cade @ anyscale.com
Just emailed you regarding this @cadedaniel. Thanks!
I'm also interested in the status of this, curious what's necessary to support LoRA inference with speculative decoding?
There is no work planned from myself. I created an issue with more details if you want to work on it @kevmo314 . https://github.com/vllm-project/vllm/issues/6912
Your current environment
🐛 Describe the bug
I'd like to try out the recently added speculative decoding features. However, I'm encountering a shape error at the following line of code when my model has
enable_lora=True
, even if I'm not using a LoRA adapter in my request:https://github.com/vllm-project/vllm/blob/c7f2cf2b7f67bce5842fedfdba508440fe257375/vllm/spec_decode/batch_expansion.py#L172-L173
Minimal reproduction with
enable_lora=True
(encounters a shape error):Minimal reproduction without
enable_lora=True
(runs without error):