Make Cache statically configurable at model construction time

Feature request

Be able to construct and load a model like:

model = AutoModelForCausalLM.from_pretrained(
    hf_model_repo,
    attn_implementation="sdpa",
    generation_config=GenerationConfig(
        use_cache=True,
        cache_implementation=cache_implementation,
        max_length=max_cache_len,
        cache_config={
            "batch_size": batch_size,
            "max_cache_len": max_cache_len,
        },
    ),
)

See additional context in #32253

Motivation

This feature request is to support torch.export(), and ensure the model is exportable in a way that can be further lowered and run in ExecuTorch with performance out-of-the-box.

Your contribution

TBD

huggingface / transformers

Make Cache statically configurable at model construction time #32500

Feature request

Motivation

Your contribution