llama docs - Githubissues

triton-inference-server / tensorrtllm_backend

The Triton TensorRT-LLM Backend

Apache License 2.0

589 stars 83 forks source link

Open MrD005 opened 6 months ago

MrD005 commented 6 months ago

if possible add speculvative decoding example in llama docs.

ncomly-nvidia commented 6 months ago

In progress! We're working on an example w/ docs now - there is an implementation you can reference here

MrD005 commented 6 months ago

@ncomly-nvidia thanks for this reference. Just need some help for triton server how to deploy 2 LLM that can be used in speculative decoding

ncomly-nvidia commented 6 months ago

A full example is in progress now. We appreciate your patience around the holidays!

Happy New Year!