huggingface / peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
https://huggingface.co/docs/peft
Apache License 2.0
15.75k stars 1.52k forks source link

Deprecation: Transformers will no longer support `past_key_values` to be tuples #1962

Open BenjaminBossan opened 1 month ago

BenjaminBossan commented 1 month ago

As reported by @ArthurZucker:

Quick question, I am seeing this in peft: https://github.com/huggingface/peft/blob/f2b6d13f1dbc971c7653aa65e82822ea2d84bb38/src/peft/peft_model.py#L1665 where there is a reliance on get_seq_length() which we are deprecating + we will no longer convert the cache to tuple object automatically in 2 releases

This will presumably affect all prompt learning methods and thus needs to be fixed soon.

For Llama, I identified the following tests, which would result in past_key_values being tuples and can serve as a starting point to work on this issue:

tests/test_decoder_models.py::PeftDecoderModelTester::test_inference_safetensors_70_test_trl_internal_testing_tiny_random_LlamaForCausalLM_prefix_tuning
tests/test_decoder_models.py::PeftDecoderModelTester::test_passing_input_embeds_works_70_test_trl_internal_testing_tiny_random_LlamaForCausalLM_prefix_tuning
tests/test_decoder_models.py::PeftDecoderModelTester::test_training_prompt_learning_tasks_72_test_trl_internal_testing_tiny_random_LlamaForCausalLM_prefix_tuning

(note that there are more tests that would fail, this is just a selection)

I haven't really worked on the prompt learning methods in PEFT and know little about the inner workings of transformers cache, so any support would be welcome.

piyush-llm commented 1 month ago

@BenjaminBossan @ArthurZucker Can you provide more information regarding deprecation of get_seq_length(). I have some idea about prompt learning methods and transformer's KV Cache.

Current implementation of cache utilities: https://github.com/huggingface/transformers/blob/f0bc49e7f61f74f055c47ad40e6010f57eed0b0b/src/transformers/cache_utils.py#L60

Can you point me to the correct branch?

gante commented 1 month ago

Hi @piyush-llm 👋

Alongside a Cache, the prepare_inputs_for_generation method also returns cache_positions on models that have received the update. This new tensor contains the indexes of the cache that will be updated, so its last value can be used as the sequence length :)

BenjaminBossan commented 1 month ago

Thanks for the details. Also check out #869 for discussion and code snippets.

YOUSEFNANIS commented 1 month ago

As reported by @ArthurZucker:

Quick question, I am seeing this in peft: https://github.com/huggingface/peft/blob/f2b6d13f1dbc971c7653aa65e82822ea2d84bb38/src/peft/peft_model.py#L1665 where there is a reliance on get_seq_length() which we are deprecating + we will no longer convert the cache to tuple object automatically in 2 releases

This will presumably affect all prompt learning methods and thus needs to be fixed soon.

For Llama, I identified the following tests, which would result in past_key_values being tuples and can serve as a starting point to work on this issue:

tests/test_decoder_models.py::PeftDecoderModelTester::test_inference_safetensors_70_test_trl_internal_testing_tiny_random_LlamaForCausalLM_prefix_tuning
tests/test_decoder_models.py::PeftDecoderModelTester::test_passing_input_embeds_works_70_test_trl_internal_testing_tiny_random_LlamaForCausalLM_prefix_tuning
tests/test_decoder_models.py::PeftDecoderModelTester::test_training_prompt_learning_tasks_72_test_trl_internal_testing_tiny_random_LlamaForCausalLM_prefix_tuning

(note that there are more tests that would fail, this is just a selection)

I haven't really worked on the prompt learning methods in PEFT and know little about the inner workings of transformers cache, so any support would be welcome.

is it ok to just convert past_key_values from a tuple to a list ?

BenjaminBossan commented 1 month ago

is it ok to just convert past_key_values from a tuple to a list ?

Unfortunately not, it would be expected to be a Cache object.

BenjaminBossan commented 1 month ago

An update from the internal discussion: So Cache shouldn't be used at all during training. At the same time, the role of past_key_values will be migrated to mean "cache". This probably means that going forward, we cannot use past_key_values at all for the purpose of injecting "virtual tokens" (or rather, embeddings) as we do right now for prompt learning.

I had hoped that this can be avoid, but it smells like this whole approach needs a rewrite. One idea would be to calculate the virtual embeddings as we do right now (PeftModel.get_prompt) but instead of handing them off to past_key_values, they need to be injected via a pre-forward hook (or, if that's not feasible, we need to monkey patch forward :see_no_evil:).

For generating, we'll probably still need a separate code path, since we do want to use caching for generation.