For example, ListCache(50, 2, fixed_keys=fixed_cache_keys) will cache 50 keys (except fixed keys). Each key is linked to 2 different values. LRU is applied when cache is full. The fixed keys are always in cache and won't be removed.
Modify api
"max_tokens" in request is the same as the number of decode steps now.
Add cache
For example,
ListCache(50, 2, fixed_keys=fixed_cache_keys)
will cache 50 keys (except fixed keys). Each key is linked to 2 different values. LRU is applied when cache is full. The fixed keys are always in cache and won't be removed.Modify api
"max_tokens"
in request is the same as the number of decode steps now.