A high-performance inference system for large language models, designed for production environments.
317
stars
24
forks
source link
[refactor] consolidate handlers to share llm_handler between python rest api server and grpc server #174
Closed
guocuimi closed 2 months ago