The prepare_inputs_for_generation() method is used at each decoding step for auto-regressive generation. The default name for the keyword argument of key-value caches is past_key_values instead of past. Renaming past to past_key_values fits the Huggingface Transformer interface and enables the key-value cache for the generation.
The
prepare_inputs_for_generation()
method is used at each decoding step for auto-regressive generation. The default name for the keyword argument of key-value caches ispast_key_values
instead ofpast
. Renamingpast
topast_key_values
fits the Huggingface Transformer interface and enables the key-value cache for the generation.Please refer to https://github.com/huggingface/transformers/blob/main/src/transformers/generation/utils.py#L751.