hpcaitech / EnergonAI

Large-scale model inference.
Apache License 2.0
630 stars 90 forks source link

[opt] executor update making batch policy #133

Closed ver217 closed 2 years ago

ver217 commented 2 years ago
  1. Make sure requests are FIFO.
  2. Requests whose decode steps <= those of queue head can be grouped into a batch.