feat: added time_to_first_token and inter_token metrics for both stream and non-stream requests

vectorch-ai / ScaleLLM

A high-performance inference system for large language models, designed for production environments.

https://docs.vectorch.com/

Apache License 2.0

316 stars 23 forks source link

Closed guocuimi closed 4 weeks ago