triton-inference-server / tensorrtllm_backend

The Triton TensorRT-LLM Backend
Apache License 2.0
581 stars 81 forks source link

Track input and output token count for every request #490

Closed brarj413 closed 3 weeks ago

brarj413 commented 3 weeks ago

hello, Please can anyone help me about tracking of input and output token counts for every request ? I am not able to add new response variable or any other way?

nv-guomingz commented 3 weeks ago

Thanks @brarj413 , I saw there's a identical feature request on https://github.com/NVIDIA/TensorRT-LLM/issues/1718. So close this one and let's use https://github.com/NVIDIA/TensorRT-LLM/issues/1718 for tracking.