zhengzangw / Sequence-Scheduling

PyTorch implementation of paper "Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline".
76 stars 15 forks source link