Open MrD005 opened 6 months ago
In progress! We're working on an example w/ docs now - there is an implementation you can reference here
@ncomly-nvidia thanks for this reference. Just need some help for triton server how to deploy 2 LLM that can be used in speculative decoding
A full example is in progress now. We appreciate your patience around the holidays!
Happy New Year!
https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/docs/llama.md
if possible add speculvative decoding example in llama docs.