improve the cache implementation

hpcaitech / EnergonAI

Large-scale model inference.

Apache License 2.0

630 stars 90 forks source link

improve the cache implementation #134

Closed dujiangsu closed 2 years ago

dujiangsu commented 2 years ago

By the way, I notice some interesting stuff. d4e16b017ef101ebc66f15286ce7d9a When executing inference with large batch size, the memory usage can extend to 56GB (30B model). However, if I fix 48GB in advance, as in the picture, the practical memory usage is not that large.