Is there a way to get the first token latency? benchmarks/benchmark_latency.py provides the latency of processing a single batch of requests but I am interested in getting first token latency
Before submitting a new issue...
[X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Is there a way to get the first token latency? benchmarks/benchmark_latency.py provides the latency of processing a single batch of requests but I am interested in getting first token latency
Before submitting a new issue...