triton-inference-server / tensorrtllm_backend

The Triton TensorRT-LLM Backend
Apache License 2.0
664 stars 96 forks source link

add speculative decoding example #432

Closed XiaobingSuper closed 3 weeks ago

XiaobingSuper commented 5 months ago

This PR is about adding a speculative decoding example.

XiaobingSuper commented 5 months ago

@Shixiaowei02, could you help review this PR?

avianion commented 4 months ago

@XiaobingSuper did you test that this actually works?

XiaobingSuper commented 3 months ago

@XiaobingSuper did you test that this actually works?

Yes, I checked and it works.