A high-performance inference system for large language models, designed for production environments.
316
stars
23
forks
source link
feat: added time_to_first_token and inter_token metrics for both stream and non-stream requests #227
Closed
guocuimi closed 4 weeks ago