A high-performance inference system for large language models, designed for production environments.
317
stars
24
forks
source link
[python] added LLM for offline inference and stream examples for chat and complete #190
Closed
guocuimi closed 1 month ago