huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
128.29k stars 25.45k forks source link

Added HHCache class implementing H2O Cache #31623

Open belericant opened 2 days ago

belericant commented 2 days ago

What does this PR do?

This PR adds the feature requested in #30758. The HHCache class is almost directly taken from the original H2O paper's authors code found here. Currently the PR only adds the changes required to Llama model class. As of now I have taken @gante 's suggestion of adding Cache.post_process() and calling it within LlamaAttention.forward.

To-Do

  1. I'm not sure if the logic for RoPE rerotation is 100% correct. I think the recent tokens are correct, but not the hh tokens after eviction. Would love to have another set of eyes on that.
  2. Write tests to ensure that this HHCache class has the same behavior compared to the original code by paper authors.
  3. Benchmarking(?)

Feedback and/or help would be appreciated. Thanks!

amyeroberts commented 2 days ago

cc @gante @ArthurZucker