huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
128.63k stars 25.51k forks source link

Caching Past Key values of any length for Vision LLM's #31096

Open saikoneru opened 1 month ago

saikoneru commented 1 month ago

Feature request

Allowing passing past key values during the forward pass of more than one token similar to the text large language models.

Motivation

According to the documentation here one could in theory pass past key values of the prompt to speed up the forward pass. However, I think that the cached forward pass only happens when the input_ids has only one new token as described in the condition here. Changes to this making it consistent with the text LLMs (thanks for that) would be highly appreciated.

Your contribution

If you can give me any pointers on how I can realign cache, create the attention masks, etc., It would also be very helpful

zucchini-nlp commented 1 month ago

@saikoneru hey!

If I understand you correctly, you want to use past_kv to continue generation in VLMs. Since VLMs consist of a vision tower and an LLM as a backbone, passing past_kv should work out-of-the-box for them. But currently there are some issues because of the way how VLMs are implemented in transformers, and simply passing one new token will not work.

If you're trying to use the generate(), have a look at #30809 which had some working examples on NanoLLaVa. If you want to pass inputs directly to the forward pass, then you have to expand and align (attn_mask, position_ids, and optionally labels) to match the size of prev past_kv, i.e. take into account special vision tokens.

saikoneru commented 1 month ago

@zucchini-nlp Thank you for your reply. I need to pass it directly to the forward function (for reranking situations). Here I can cache the prompt and rank the candidates faster. I managed to use the forward pass for one new token but will see how to adapt it for more than one token. Thanks again.