replicate / cog-triton

A cog implementation of Nvidia's Triton server
Apache License 2.0
11 stars 0 forks source link

handle triton 0.10.0 not returning the entire sequence #42

Closed technillogue closed 4 weeks ago

technillogue commented 2 months ago

triton 0.10.0 has a breaking change. this PR tries to support it.

https://github.com/NVIDIA/TensorRT-LLM/releases/tag/v0.10.0#:~:text=The%20input%20prompt%20was%20removed%20from%20the%20generation%20output