Closed vegetableysm closed 4 months ago
Before this optimize:
With kv_state_cache_benchmark_tes.cc(In https://github.com/v6d-io/v6d/pull/1816) and the config with:
constexpr int TENSORBYTES = 80;
constexpr int CAPACITY = 20000;
constexpr int LAYER = 64;
constexpr int BLOCK_SIZE = 100;
And the token list length is 1900;
So, the llm cache object contains 370 * 128 = 47360 members.( c means the number of block object contained in the cache object)
After this optimize, the cache object do not make the block as its member. So, the object with the largest number of members is cache block, which has 128 members
What do these changes do?
As tittled.
Related issue number
Fixes #1733