ml-explore / mlx-examples

Examples in the MLX framework
MIT License
6.3k stars 898 forks source link

Fix rotating KV cache for chat use case #1014

Closed awni closed 1 month ago

awni commented 1 month ago

The rotating KV cache didn't work if one alternates cache filling with generation. This fixes that + some tests.

Closes #1000

awni commented 1 month ago

Closing in favor of #1015 which I think has a better fix.