triton-inference-server / tensorrtllm_backend

The Triton TensorRT-LLM Backend
Apache License 2.0
589 stars 83 forks source link

llama docs #226

Open MrD005 opened 6 months ago

MrD005 commented 6 months ago

https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/docs/llama.md

if possible add speculvative decoding example in llama docs.

ncomly-nvidia commented 6 months ago

In progress! We're working on an example w/ docs now - there is an implementation you can reference here

MrD005 commented 6 months ago

@ncomly-nvidia thanks for this reference. Just need some help for triton server how to deploy 2 LLM that can be used in speculative decoding

ncomly-nvidia commented 6 months ago

A full example is in progress now. We appreciate your patience around the holidays!

Happy New Year!