Bump transformers from 4.36.0 to 4.38.2

Bumps transformers from 4.36.0 to 4.38.2.

Release notes

v4.38.2

Fix backward compatibility issues with Llama and Gemma:

We mostly made sure that performances are not affected by the new change of paradigm with ROPE. Fixed the ROPE computation (should always be in float32) and the causal_mask dtype was set to bool to take less RAM.

YOLOS had a regression, and Llama / T5Tokenizer had a warning popping for random reasons

FIX [Gemma] Fix bad rebase with transformers main (#29170)

Improve _update_causal_mask performance (#29210)

[T5 and Llama Tokenizer] remove warning (#29346)

[Llama ROPE] Fix torch export but also slow downs in forward (#29198)

RoPE loses precision for Llama / Gemma + Gemma logits.float() (#29285)

Patch YOLOS and others (#29353)

Use torch.bool instead of torch.int64 for non-persistant causal mask buffer (#29241)

v4.38.1

Fix eager attention in Gemma!

[Gemma] Fix eager attention #29187 by @sanchit-gandhi

TLDR:
-        attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+        attn_output = attn_output.view(bsz, q_len, -1)
v4.38: Gemma, Depth Anything, Stable LM; Static Cache, HF Quantizer, AQLM

New model additions

💎 Gemma 💎

Gemma is a new opensource Language Model series from Google AI that comes with a 2B and 7B variant. The release comes with the pre-trained and instruction fine-tuned versions and you can use them via AutoModelForCausalLM, GemmaForCausalLM or pipeline interface!

Read more about it in the Gemma release blogpost: https://hf.co/blog/gemma
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b")
model = AutoModelForCausalLM.from_pretrained("google/gemma-2b", device_map="auto", torch_dtype=torch.float16)
input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids)
You can use the model with Flash Attention, SDPA, Static cache and quantization API for further optimizations !

Flash Attention 2

... (truncated)

Commits

092f1fd Release 4.38.2
bf5163f fix merge conflicts between llama and gemma
6c45f0f Use torch.bool instead of torch.int64 for non-persistant causal mask buff...
bfefb8e Patch YOLOS and others (#29353)
20164cc RoPE loses precision for Llama / Gemma + Gemma logits.float() (#29285)
d5ec194 [Llama ROPE] Fix torch export but also slow downs in forward (#29198)
bf99e86 [T5 and Llama Tokenizer] remove warning (#29346)
6d02350 Improve _update_causal_mask performance (#29210)
4f8689e FIX [Gemma] Fix bad rebase with transformers main (#29170)
a085774 Release: v4.38.1
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

kyegomez / zeta

Bump transformers from 4.36.0 to 4.38.2 #169

v4.38.2

Fix backward compatibility issues with Llama and Gemma:

v4.38.1

Fix eager attention in Gemma!

v4.38: Gemma, Depth Anything, Stable LM; Static Cache, HF Quantizer, AQLM

New model additions

💎 Gemma 💎