huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
135.65k stars 27.16k forks source link

ValueError: Invalid `cache_implementation` (offloaded). #34718

Closed leigao97 closed 1 day ago

leigao97 commented 2 weeks ago

System Info

Who can help?

No response

Information

Tasks

Reproduction

I am following the official example to enable KV cache offloading. https://huggingface.co/docs/transformers/en/kv_cache#offloaded-cache

And I got the error message:

  File "/transformers/generation/configuration_utils.py", line 726, in validate
    raise ValueError(
ValueError: Invalid `cache_implementation` (offloaded). Choose one of: ['static', 'offloaded_static', 'sliding_window', 'hybrid', 'mamba', 'quantized', 'static']

Expected behavior

I expected that cache_implementation="offloaded" is a valid option taken by model.generate(). After enabling KV cache offloading, the peak memory usage should go down and inference time should go up.

zucchini-nlp commented 2 weeks ago

hmm, indeed the latest changes to verify cache implementation in the config broke offloaded cache. You can try to pass in the cache object directly to generate() until it is fixed.

from transformers import OffloadedCache

model.generate(**inputs, past_key_values=OffloadedCache()) 

cc @gante if you have this already on your radar or I can fix it next week since you'll be off for until December

LysandreJik commented 2 weeks ago

cc @ArthurZucker as well

ArthurZucker commented 4 days ago

Since @gante is not here, @zucchini-nlp can you have a look ?