hpcaitech / EnergonAI

Large-scale model inference.
Apache License 2.0
630 stars 90 forks source link

[opt] add cache and modify api #135

Closed ver217 closed 2 years ago

ver217 commented 2 years ago

Add cache

For example, ListCache(50, 2, fixed_keys=fixed_cache_keys) will cache 50 keys (except fixed keys). Each key is linked to 2 different values. LRU is applied when cache is full. The fixed keys are always in cache and won't be removed.

Modify api

"max_tokens" in request is the same as the number of decode steps now.