[KV Cache] Overwrite Cache - SW Attention

mlc-ai / relax

Apache License 2.0

137 stars 69 forks source link

[KV Cache] Overwrite Cache - SW Attention #297

Closed davidpissarra closed 8 months ago

davidpissarra commented 8 months ago

Part of the effort on Sliding Window Attention (SWA) https://github.com/mlc-ai/mlc-llm/issues/1003. Overwriting the cache is useful when computing SWA, so we can have a more efficient cache only containing the current window keys and values. Once the cache is full we start overwriting the older entries.

tqchen commented 8 months ago

Thanks @davidpissarra . The main comment is that given the impl is specialized to window, let us make sure tha API naming highlights the fact

tqchen commented 8 months ago

please also send the PR to unity branch of https://github.com/apache/tvm