Open jqueguiner opened 2 months ago
As of now Medusa is generating hallucinations as the speculative multihead is not supporting the outline decoding grammar.
Support speculative decoding for performance reasons
Note: only tgi is supporting Medusa not vllm for now but planned.
Do you know if the n-gram speculation is working? I think that would be even more impactful and simpler to handle since a lot of structured task are rewrite
What behavior of the library made you think about the improvement?
As of now Medusa is generating hallucinations as the speculative multihead is not supporting the outline decoding grammar.
How would you like it to behave?
Support speculative decoding for performance reasons
Note: only tgi is supporting Medusa not vllm for now but planned.