A high-performance inference system for large language models, designed for production environments.
317
stars
24
forks
source link
feat: moved scheduler wait logic from python into scheduler run_until_complete function #200
Closed
guocuimi closed 1 month ago