kyegomez / zeta

Build high-performance AI models with modular building blocks
https://zeta.apac.ai
Apache License 2.0
333 stars 28 forks source link

Bump transformers from 4.36.0 to 4.38.2 #169

Closed dependabot[bot] closed 4 months ago

dependabot[bot] commented 4 months ago

Bumps transformers from 4.36.0 to 4.38.2.

Release notes

Sourced from transformers's releases.

v4.38.2

Fix backward compatibility issues with Llama and Gemma:

We mostly made sure that performances are not affected by the new change of paradigm with ROPE. Fixed the ROPE computation (should always be in float32) and the causal_mask dtype was set to bool to take less RAM.

YOLOS had a regression, and Llama / T5Tokenizer had a warning popping for random reasons

  • FIX [Gemma] Fix bad rebase with transformers main (#29170)
  • Improve _update_causal_mask performance (#29210)
  • [T5 and Llama Tokenizer] remove warning (#29346)
  • [Llama ROPE] Fix torch export but also slow downs in forward (#29198)
  • RoPE loses precision for Llama / Gemma + Gemma logits.float() (#29285)
  • Patch YOLOS and others (#29353)
  • Use torch.bool instead of torch.int64 for non-persistant causal mask buffer (#29241)

v4.38.1

Fix eager attention in Gemma!

TLDR:

-        attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
+        attn_output = attn_output.view(bsz, q_len, -1)

v4.38: Gemma, Depth Anything, Stable LM; Static Cache, HF Quantizer, AQLM

New model additions

💎 Gemma 💎

Gemma is a new opensource Language Model series from Google AI that comes with a 2B and 7B variant. The release comes with the pre-trained and instruction fine-tuned versions and you can use them via AutoModelForCausalLM, GemmaForCausalLM or pipeline interface!

Read more about it in the Gemma release blogpost: https://hf.co/blog/gemma

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b") model = AutoModelForCausalLM.from_pretrained("google/gemma-2b", device_map="auto", torch_dtype=torch.float16)

input_text = "Write me a poem about Machine Learning." input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids)

You can use the model with Flash Attention, SDPA, Static cache and quantization API for further optimizations !

  • Flash Attention 2

... (truncated)

Commits
  • 092f1fd Release 4.38.2
  • bf5163f fix merge conflicts between llama and gemma
  • 6c45f0f Use torch.bool instead of torch.int64 for non-persistant causal mask buff...
  • bfefb8e Patch YOLOS and others (#29353)
  • 20164cc RoPE loses precision for Llama / Gemma + Gemma logits.float() (#29285)
  • d5ec194 [Llama ROPE] Fix torch export but also slow downs in forward (#29198)
  • bf99e86 [T5 and Llama Tokenizer] remove warning (#29346)
  • 6d02350 Improve _update_causal_mask performance (#29210)
  • 4f8689e FIX [Gemma] Fix bad rebase with transformers main (#29170)
  • a085774 Release: v4.38.1
  • Additional commits viewable in compare view


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
evelynmitchell commented 4 months ago

This can be merged.