speculative decoding - Githubissues

triton-inference-server / tensorrtllm_backend

The Triton TensorRT-LLM Backend

Apache License 2.0

588 stars 81 forks source link

Open MrD005 opened 6 months ago

MrD005 commented 6 months ago

how to use speculative decoding? is there any document for understanding it better?

added support in recent update for both tensorRT llm and TensorRT llm backend

ncomly-nvidia commented 6 months ago

We're working on an example w/ docs now - there is an implementation you can reference here