ValueError: Invalid `cache_implementation` (offloaded).

leigao97 commented 2 weeks ago

System Info

transformers version: 4.46.2
Platform: Linux-5.15.0-1049-oracle-x86_64-with-glibc2.35
Python version: 3.10.14
Huggingface_hub version: 0.25.1
Safetensors version: 0.4.5
Accelerate version: 1.0.1
Accelerate config: not found
PyTorch version (GPU?): 2.4.1+cu121 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?:
Using GPU in script?:
GPU type: NVIDIA A100-SXM4-40GB

Who can help?

No response

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

I am following the official example to enable KV cache offloading. https://huggingface.co/docs/transformers/en/kv_cache#offloaded-cache

And I got the error message:

  File "/transformers/generation/configuration_utils.py", line 726, in validate
    raise ValueError(
ValueError: Invalid `cache_implementation` (offloaded). Choose one of: ['static', 'offloaded_static', 'sliding_window', 'hybrid', 'mamba', 'quantized', 'static']

Expected behavior

I expected that cache_implementation="offloaded" is a valid option taken by model.generate(). After enabling KV cache offloading, the peak memory usage should go down and inference time should go up.

zucchini-nlp commented 2 weeks ago

hmm, indeed the latest changes to verify cache implementation in the config broke offloaded cache. You can try to pass in the cache object directly to generate() until it is fixed.

from transformers import OffloadedCache

model.generate(**inputs, past_key_values=OffloadedCache())

cc @gante if you have this already on your radar or I can fix it next week since you'll be off for until December

LysandreJik commented 2 weeks ago

cc @ArthurZucker as well

ArthurZucker commented 4 days ago

Since @gante is not here, @zucchini-nlp can you have a look ?

huggingface / transformers