huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
133.87k stars 26.77k forks source link

[DOCS] - Model outputs of RecurrentGemmaCausalLM doesn't align with the documentation #30736

Closed godjw closed 4 months ago

godjw commented 5 months ago

System Info

Latest docs on https://huggingface.co/docs/transformers/main/model_doc/recurrent_gemma#transformers.RecurrentGemmaForCausalLM

Who can help?

@ArthurZucker, @younesbelkada, and @stevhliu

Information

Tasks

Reproduction

In the documentation for RecurrentGemmaForCausalLM's forward function, the docs read:

Returns transformers.modeling_outputs.CausalLMOutput or tuple(torch.FloatTensor) with attentions (tuple(torch.FloatTensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) — Tuple of torch.FloatTensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

However, examining the source code for RecurrentGemma and RecurrentGemmaForCausalLM reveals that the model does not accept output_attentions=True and, contrary to the documentation, does not return any attention values. The relevant sections of the source code can be found here: https://github.com/huggingface/transformers/blob/47735f5f0f2752500d115d2f6bd57816032599b6/src/transformers/models/recurrent_gemma/modeling_recurrent_gemma.py#L744-L747

https://github.com/huggingface/transformers/blob/47735f5f0f2752500d115d2f6bd57816032599b6/src/transformers/models/recurrent_gemma/modeling_recurrent_gemma.py#L895-L899

Expected behavior

ArthurZucker commented 5 months ago

Hey! Would you like to open a PR for this? 🤗 Documentation should be modified IMO unless people ask for output attentions logic, which is a feature request

github-actions[bot] commented 4 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.