Bump transformers from 4.36.0 to 4.41.2

Bumps transformers from 4.36.0 to 4.41.2.

Release notes

Release v4.41.2

Mostly fixing some stuff related to trust_remote_code=True and from_pretrained

The local_file_only was having a hard time when a .safetensors file did not exist. This is not expected and instead of trying to convert, we should just fallback to loading the .bin files.

Do not trigger autoconversion if local_files_only #31004 from @Wauplin fixes this!

Paligemma: Fix devices and dtype assignments (#31008) by @molbap

Redirect transformers_agents doc to agents (#31054) @aymeric-roucher

Fix from_pretrained in offline mode when model is preloaded in cache (#31010) by @oOraph

Fix faulty rstrip in module loading (#31108) @Rocketknight1

Release v4.41.1 Fix PaliGemma finetuning, and some small bugs

Release v4.41.1

Fix PaliGemma finetuning:

The causal mask and label creation was causing label leaks when training. Kudos to @probicheaux for finding and reporting!

https://github.com/huggingface/transformers/commit/a755745546779ae5c42510bc02a859bdac82b3b7 : PaliGemma - fix processor with no input text (huggingface/transformers#30916) @hiyouga

https://github.com/huggingface/transformers/commit/a25f7d3c12975fe21eab437dda7363e9024de7c0 : Paligemma causal attention mask (huggingface/transformers#30967) @molbap and @probicheaux

Other fixes:

https://github.com/huggingface/transformers/commit/bb48e921868ac750417956de941606f7e2fa02ca: tokenizer_class = "AutoTokenizer" Llava Family (huggingface/transformers#30912)

https://github.com/huggingface/transformers/commit/1d568dfab262f76079eb4f3d05b606d51a0c9e4b : legacy to init the slow tokenizer when converting from slow was wrong (huggingface/transformers#30972)

https://github.com/huggingface/transformers/commit/b1065aa08ac0da11fcb9e3827cd7eafabe4beebd : Generation: get special tokens from model config (huggingface/transformers#30899) @zucchini-nlp

Reverted https://github.com/huggingface/transformers/commit/4ab7a28216211571fdddba414d4edd8426ab6489

v4.41.0: Phi3, JetMoE, PaliGemma, VideoLlava, Falcon2, FalconVLM & GGUF support

New models

Phi3

The Phi-3 model was proposed in Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone by Microsoft.

TLDR; Phi-3 introduces new ROPE scaling methods, which seems to scale fairly well! A 3b and a Phi-3-mini is available in two context-length variants—4K and 128K tokens. It is the first model in its class to support a context window of up to 128K tokens, with little impact on quality.

Phi-3 by @gugarosa in huggingface/transformers#30423

JetMoE

JetMoe-8B is an 8B Mixture-of-Experts (MoE) language model developed by Yikang Shen and MyShell. JetMoe project aims to provide a LLaMA2-level performance and efficient language model with a limited budget. To achieve this goal, JetMoe uses a sparsely activated architecture inspired by the ModuleFormer. Each JetMoe block consists of two MoE layers: Mixture of Attention Heads and Mixture of MLP Experts. Given the input tokens, it activates a subset of its experts to process them. This sparse activation schema enables JetMoe to achieve much better training throughput than similar size dense models. The training throughput of JetMoe-8B is around 100B tokens per day on a cluster of 96 H100 GPUs with a straightforward 3-way pipeline parallelism strategy.

... (truncated)

Commits

ab0f050 Release: v4.41.2
57f5553 Fix faulty rstrip in module loading (#31108)
73b180c fix from_pretrained in offline mode when model is preloaded in cache (#31010)
a6325a7 Redirect transformers_agents doc to agents (#31054)
9ccdc84 Paligemma- fix devices and dtype assignments (#31008)
12aa316 Do not trigger autoconversion if local_files_only (#31004)
75f15f3 Release: v4.41.1
8282db5 Paligemma causal attention mask (#30967)
e5b788a Revert "feat: Upgrade Weights & Biases callback (#30135)"
9d05459 Generation: get special tokens from model config (#30899)
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

kyegomez / zeta

Bump transformers from 4.36.0 to 4.41.2 #227

Release v4.41.2

Release v4.41.1 Fix PaliGemma finetuning, and some small bugs

Release v4.41.1

Fix PaliGemma finetuning:

v4.41.0: Phi3, JetMoE, PaliGemma, VideoLlava, Falcon2, FalconVLM & GGUF support

New models

Phi3

JetMoE