Closed godjw closed 4 months ago
Hey! Would you like to open a PR for this? 🤗 Documentation should be modified IMO unless people ask for output attentions logic, which is a feature request
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
System Info
Latest docs on https://huggingface.co/docs/transformers/main/model_doc/recurrent_gemma#transformers.RecurrentGemmaForCausalLM
Who can help?
@ArthurZucker, @younesbelkada, and @stevhliu
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
In the documentation for RecurrentGemmaForCausalLM's forward function, the docs read:
However, examining the source code for RecurrentGemma and RecurrentGemmaForCausalLM reveals that the model does not accept
output_attentions=True
and, contrary to the documentation, does not return any attention values. The relevant sections of the source code can be found here: https://github.com/huggingface/transformers/blob/47735f5f0f2752500d115d2f6bd57816032599b6/src/transformers/models/recurrent_gemma/modeling_recurrent_gemma.py#L744-L747https://github.com/huggingface/transformers/blob/47735f5f0f2752500d115d2f6bd57816032599b6/src/transformers/models/recurrent_gemma/modeling_recurrent_gemma.py#L895-L899
Expected behavior
output_attentions=True