This PR updates transformers from 4.39.3 to 4.42.3.
Changelog
### 4.42.3
```
Make sure we have attention softcapping for "eager" GEMMA2 model
After experimenting, we noticed that for the 27b model mostly, softcapping is a must. So adding it back (it should have been there, but an error on my side made it disappear) sorry all! 😭
- Gemma capping is a must for big models (31698)
```
### 4.42.2
```
Patch release
Thanks to our 2 contributors for their prompt fixing mostly applies for training and FA2!
- Fix Gemma2 4d attention mask (31674) by hiyouga
- don't zero out the attention_mask when using sliding window with flash attention (31670) by winglian
```
### 4.42.1
```
Patch release for commit:
- [HybridCache] Fix get_seq_length method (31661)
```
### 4.42.0
```
New model additions
Gemma-2
The Gemma2 model was proposed in [Gemma2: Open Models Based on Gemini Technology and Research](https://blog.google/technology/developers/Gemma2-open-models/) by Gemma2 Team, Google.
Gemma2 models are trained on 6T tokens, and released with 2 versions, 2b and 7b.
The abstract from the paper is the following:
*This work introduces Gemma2, a new family of open language models demonstrating strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Gemma2 outperforms similarly sized open models on 11 out of 18 text-based tasks, and we present comprehensive evaluations of safety and responsibility aspects of the models, alongside a detailed description of our model development. We believe the responsible release of LLMs is critical for improving the safety of frontier models, and for enabling the next wave of LLM innovations*
![image](https://github.com/huggingface/transformers/assets/30755778/798b25f4-485a-4b60-abe5-af612def209b)
* Add gemma 2 by ArthurZucker in 31659
RTDETR
The RT-DETR model was proposed in [DETRs Beat YOLOs on Real-time Object Detection](https://arxiv.org/abs/2304.08069) by Wenyu Lv, Yian Zhao, Shangliang Xu, Jinman Wei, Guanzhong Wang, Cheng Cui, Yuning Du, Qingqing Dang, Yi Liu.
RT-DETR is an object detection model that stands for “Real-Time DEtection Transformer.” This model is designed to perform object detection tasks with a focus on achieving real-time performance while maintaining high accuracy. Leveraging the transformer architecture, which has gained significant popularity in various fields of deep learning, RT-DETR processes images to identify and locate multiple objects within them.
![image](https://github.com/huggingface/transformers/assets/30755778/78b096d4-2686-41cb-9fdd-1cd517722fd3)
* New model support RTDETR by SangbumChoi in 29077
InstructBlip
The InstructBLIP model was proposed in [InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning](https://arxiv.org/abs/2305.06500) by Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven Hoi. InstructBLIP leverages the [BLIP-2](https://huggingface.co/docs/transformers/main/en/model_doc/blip2) architecture for visual instruction tuning.
InstructBLIP uses the same architecture as [BLIP-2](https://huggingface.co/docs/transformers/main/en/model_doc/blip2) with a tiny but important difference: it also feeds the text prompt (instruction) to the Q-Former.
![image](https://github.com/huggingface/transformers/assets/30755778/fd6997aa-d299-4d14-9eab-c3f16309bae9)
* Add video modality for InstrucBLIP by zucchini-nlp in 30182
LlaVa NeXT Video
The LLaVa-NeXT-Video model was proposed in [LLaVA-NeXT: A Strong Zero-shot Video Understanding Model](https://llava-vl.github.io/blog/2024-04-30-llava-next-video/) by Yuanhan Zhang, Bo Li, Haotian Liu, Yong Jae Lee, Liangke Gui, Di Fu, Jiashi Feng, Ziwei Liu, Chunyuan Li. LLaVa-NeXT-Video improves upon [LLaVa-NeXT](https://huggingface.co/docs/transformers/main/en/model_doc/llava_next) by fine-tuning on a mix if video and image dataset thus increasing the model’s performance on videos.
[LLaVA-NeXT](https://huggingface.co/docs/transformers/main/en/model_doc/llava_next) surprisingly has strong performance in understanding video content in zero-shot fashion with the AnyRes technique that it uses. The AnyRes technique naturally represents a high-resolution image into multiple images. This technique is naturally generalizable to represent videos because videos can be considered as a set of frames (similar to a set of images in LLaVa-NeXT). The current version of LLaVA-NeXT makes use of AnyRes and trains with supervised fine-tuning (SFT) on top of LLaVA-Next on video data to achieves better video understanding capabilities.The model is a current SOTA among open-source models on [VideoMME bench](https://arxiv.org/abs/2405.21075).
* Add LLaVa NeXT Video by zucchini-nlp in 31252
New model adder
A very significant change makes its way within the `transformers` codebase, introducing a new way to add models to `transformers`. We recommend reading the description of the PR below, but here is the gist of it:
> The diff_converter tool is here to replace our old Copied from statements, while keeping our core transformers philosophy:
>
> - single model single file
> - explicit code
> - standardization of modeling code
> - readable and educative code
> - simple code
> - least amount of modularity
>
> This additionally unlocks the ability to very quickly see the differences between new architectures that get developed. While many architectures are similar, the "single model, single file" policy can obfuscate the changes. With this diff converter, we want to make the changes between architectures very explicit.
* Diff converter v2 by ArthurZucker in 30868
Tool-use and RAG model support
We've made major updates to our support for tool-use and RAG models. We can now automatically generate JSON schema descriptions for Python functions which are suitable for passing to tool models, and we've defined a standard API for tool models which should allow the same tool inputs to be used with many different models. Models will need updates to their chat templates to support the new API, and we're targeting the **Nous-Hermes**, **Command-R** and **Mistral/Mixtral** model families for support in the very near future. Please see the updated [chat template docs](https://huggingface.co/docs/transformers/main/chat_templating) for more information.
If you are the owner of a model that supports tool use, but you're not sure how to update its chat template to support the new API, feel free to reach out to us for assistance with the update, for example on the [Hugging Face Discord server](https://hf.co/join/discord). Ping Matt and yell key phrases like "chat templates" and "Jinja" and your issue will probably get resolved.
* Chat Template support for function calling and RAG by Rocketknight1 in 30621
GGUF support
We further the support of GGUF files to offer fine-tuning within the python/HF ecosystem, before converting them back to the GGUF/GGML/llama.cpp libraries.
* Add Qwen2 GGUF loading support by Isotr0py in 31175
* GGUF: Fix llama 3 GGUF by younesbelkada in 31358
* Fix llama gguf converter by SunMarc in 31575
Trainer improvements
A new optimizer is added in the `Trainer`.
* FEAT / Trainer: LOMO optimizer support by younesbelkada in 30178
Quantization improvements
Several improvements are done related to quantization: a new cache (the quantized KV cache) is added, offering the ability to convert the cache of generative models, further reducing the memory requirements.
Additionally, the documentation related to quantization is entirely redone with the aim of helping users choose which is the best quantization method.
* Quantized KV Cache by zucchini-nlp in 30483
* Docs / Quantization: refactor quantization documentation by younesbelkada in 30942
Examples
New instance segmentation examples are added by qubvel
* Instance segmentation examples by qubvel in 31084
Notable improvements
As a notable improvement to the HF vision models that leverage backbones, we enable leveraging HF pretrained model weights as backbones, with the following API:
py
from transformers import MaskFormerConfig, MaskFormerForInstanceSegmentation
config = MaskFormerConfig(backbone="microsoft/resnet-50", use_pretrained_backbone=True)
model = MaskFormerForInstanceSegmentation(config)
* Enable HF pretrained backbones by amyeroberts in 31145
Additionally, we thank Cyrilvallez for diving into our `generate` method and greatly reducing the memory requirements.
* Reduce by 2 the memory requirement in `generate()` 🔥🔥🔥 by Cyrilvallez in 30536
Breaking changes
Remove ConversationalPipeline and Conversation object
Both the ConversationalPipeline and the Conversation object have been deprecated for a while, and are due for removal in 4.42, which is the upcoming version.
The `TextGenerationPipeline` is recommended for this use-case, and now accepts inputs in the form of the OpenAI API.
* 🚨 Remove ConversationalPipeline and Conversation object by Rocketknight1 in 31165
Remove an accidental duplicate softmax application in FLAVA's attention
Removes duplicate softmax application in FLAVA attention. Likely to have a small change on the outputs but flagging with 🚨 as it will change a bit.
* 🚨 FLAVA: Remove double softmax by amyeroberts in 31322
Idefics2's `ignore_index` attribute of the loss is updated to `-100`
* 🚨 [Idefics2] Update ignore index by NielsRogge in 30898
out_indices from `timm` being updated
Recent updates to timm changed the type of the attribute `model.feature_info.out_indices`. Previously, `out_indices` would reflect the input type of `out_indices` on the `create_model` call i.e. either `tuple` or `list`. Now, this value is always a tuple.
As list are more useful and consistent for us -- we cannot save tuples in configs, they must be converted to lists first -- we instead choose to cast `out_indices` to always be a list.
This has the possibility of being a slight breaking change if users are creating models and relying on `out_indices` on being a tuple. As this property only happens when a new model is created, and not if it's saved and reloaded (because of the config), then I think this has a low chance of having much of an impact.
* 🚨 out_indices always a list by amyeroberts in 30941
datasets referenced in the quantization config get updated to remove references to datasets with restrictive licenses.
* 🚨 Remove dataset with restrictive license by echarlaix in 31452
Bugfixes and improvements
* Add fixed resize and pad strategy for object detection by qubvel in 30742
* Enable dynamic resolution input for Swin Transformer and variants by the-neural-networker in 30656
* Add TokenClassification for Mistral, Mixtral and Qwen2 by josephenguehard in 29878
* FIX / Quantization: Fix Dockerfile build by younesbelkada in 30890
* Add support for torch.compile dynamic shapes by warner-benjamin in 30560
* LLaVa-Next: Update docs with batched inference by zucchini-nlp in 30857
* DeformableDETR two stage support bfloat16 by DonggeunYu in 30907
* add return_token_timestamps to WhisperProcessor by kamilakesbi in 30812
* Fix num_hidden_layers in initialization of new model in Mamba by SrGonao in 30403
* separate kwargs in processor (similar to 30193) by Eric2i in 30905
* fix for custom pipeline configuration by not-lain in 29004
* Add AutoFeatureExtractor support to Wav2Vec2ProcessorWithLM by ylacombe in 28706
* Fix a shape annotation and typos in `mamba` slow forward by vasqu in 30691
* `tokenizer_class = "AutoTokenizer"` Llava Family by ArthurZucker in 30912
* Introduce configured_state arg for accelerator_config by muellerzr in 29781
* Add torch.compile for Mistral by zhenglongjiepheonix in 30642
* [docs] Spanish translation of model_memory_anatomy.md by aaronjimv in 30885
* FIX / TST: Fix expected results on Mistral slow test (A10) by younesbelkada in 30909
* PaliGemma - fix processor with no input text by hiyouga in 30916
* CI: AMD MI300 tests fix by mht-sharma in 30797
* Enforce saving at end of training if saving option chosen by muellerzr in 30160
* fix: center_crop occasionally outputs off-by-one dimension matrix by mattlbeck in 30934
* [Benchmark] Reuse `optimum-benchmark` by ydshieh in 30615
* TST / Workflows: Get slack notifications for docker image build by younesbelkada in 30891
* Fix swin embeddings interpolation by amyeroberts in 30936
* Fix inhomogeneous shape error in example by Zantares in 30434
* update ruff version by ArthurZucker in 30932
* Update build ci image [push-ci-image] by ArthurZucker in 30933)
* Update video-llava docs by zucchini-nlp in 30935
* Fix low cpu mem usage tests by SunMarc in 30808
* [doc] Add references to the fine-tuning blog and distil-whisper to Whisper. by Vaibhavs10 in 30938
* Avoid extra chunk in speech recognition by jonatanklosko in 29539
* [whisper] only trigger forced ids warning once by sanchit-gandhi in 30966
* Paligemma - fix slow tests, add bf16 and f16 slow tests by molbap in 30851
* Finally fix the missing new model failure CI report by ydshieh in 30968
* legacy to init the slow tokenizer when converting from slow was wrong by ArthurZucker in 30972
* Generation: get special tokens from model config by zucchini-nlp in 30899
* [Whisper] Strip prompt before finding common subsequence by sanchit-gandhi in 27836
* Fix link in Pipeline documentation by junhl in 30948
* [Mistral and friends] Update MLP by NielsRogge in 31057
* Paligemma causal attention mask by molbap in 30967
* Update object detection with latest resize and pad strategies by qubvel in 30955
* Using assistant in AutomaticSpeechRecognitionPipeline with different encoder size by kamilakesbi in 30637
* Push ci image by ArthurZucker in 30982
* test_custom_4d_attention_mask skip with sliding window attn by poedator in 30833
* Finish adding support for torch.compile dynamic shapes by warner-benjamin in 30919
* FIX / Docs: Minor changes in quantization docs by younesbelkada in 30985
* Fix accelerate failing tests by SunMarc in 30836
* [tests] add `torch.use_deterministic_algorithms` for XPU by faaany in 30774
* Add a check that warmup_setps is either 0 or >= 1 by ymoslem in 30764
* Update 4 `MptIntegrationTests` expected outputs by ydshieh in 30989
* [Port] TensorFlow implementation of Mistral by ariG23498 in 29708
* Remove deprecated properties in tokenization_nllb.py and tokenization_nllb_fast.py by ymoslem in 29834
* Bugfix: WandbCallback uploads initial model checkpoint by mgerstgrasser in 30897
* add prefix space ignored in llama 29625 by itazap in 30964
* Fix training speed regression introduced by "optimize VRAM for calculating pos_bias in LayoutLM v2, v3 by kkoehncke in 26139)"
* Do not trigger autoconversion if local_files_only by Wauplin in 31004
* pin `uv==0.1.45` by ydshieh in 31006
* Perceiver interpolate position embedding by g1y5x3 in 30979
* [tests] make `test_model_parallelism` device-agnostic by faaany in 30844
* FIX / TST: Fix expected results on Mistral AWQ test by SunMarc in 30971
* allow multi-gpu by ydshieh in 31011
* Fix resume_download future warning by Wauplin in 31007
* Quantization / TST: Fix remaining quantization tests by younesbelkada in 31000
* save the list of new model failures by ydshieh in 31013
* added interpolation for vitmae model in pytorch as well as tf. by bhuvanmdev in 30732
* Add split special tokens by itazap in 30772
* Paligemma- fix devices and dtype assignments by molbap in 31008
* Redirect transformers_agents doc to agents by aymeric-roucher in 31054
* unpin uv by ydshieh in 31055
* Follow up: Fix link in dbrx.md by eitanturok in 30514
* Update feature request label in template by amyeroberts in 30940
* Fix quanto tests by SunMarc in 31062
* Fix pad_to_max_length Whisper by ylacombe in 30787
* skip `test_model_parallelism` for 2 model test classes by ydshieh in 31067
* use `main` by ydshieh in 31065
* Remove `ninja` from docker image build by ydshieh in 31080
* fix "piano" typo by clinty in 31027
* Update quicktour.md to fix broken link to Glossary by apalkk in 31072
* Remove redundant backend checks in training_args.py by kevint324 in 30999
* fix from_pretrained in offline mode when model is preloaded in cache by oOraph in 31010
* Remove float64 cast for OwlVit and OwlV2 to support MPS device by qubvel in 31071
* Fix OWLv2 post_process_object_detection for multiple images by qubvel in 31082
* Fix typo in trainer.py by taslimisina in 31048
* [SuperPoint, PaliGemma] Update docs by NielsRogge in 31025
* Fix failing tokenizer tests by LysandreJik in 31083
* Watermark: fix tests by zucchini-nlp in 30961
* Docs / PEFT: Add PEFT API documentation by younesbelkada in 31078
* Render chat template tojson filter as unicode by CISC in 31041
* FIX: Add `accelerate` as a hard requirement by younesbelkada in 31090
* FIX / OPT: Fix OPT multi-GPU training for `OPTForQuestionAnswering` by younesbelkada in 31092
* skip `test_multi_gpu_data_parallel_forward` for `vit` and `deit` by ydshieh in 31086
* Fix PretrainedConfig docstring with deprecated resume_download by albertvillanova in 31014
* Fix DeepSpeed compatibility with weight_norm by jonnyli1125 in 30881)
* TST: Fix instruct-blip tests by younesbelkada in 31088
* Docs / Quantization: Redirect deleted page by younesbelkada in 31063
* Deprecate low use models by amyeroberts in 30781
* Quantized KV cache: update quanto by zucchini-nlp in 31052
* FEAT: Add mistral v3 conversion script by younesbelkada in 30981
* Use `HF_HUB_OFFLINE` + fix has_file in offline mode by Wauplin in 31016
* Improve `transformers-cli env` reporting by statelesshz in 31003
* Fix env.py in cases where torch is not present by Rocketknight1 in 31113
* Fix faulty rstrip in module loading by Rocketknight1 in 31108
* Rm maintainer + migrate by muellerzr in 31089
* Fix nightly circleci by ydshieh in 31114
* FIX / Docs: Fix GPTQ expected number of bits by younesbelkada in 31111
* Add VLM generation default contributor by gante in 31115
* Add on_optimizer_step to callback options by dhruvbpai in 31095
* Cleanup docker build by ydshieh in 31119
* FIX / Quantization: Add extra validation for bnb config by younesbelkada in 31135
* fix get_scheduler when name is warmup_stable_decay by zspo in 31128
* Docs / Quantization: Replace all occurences of `load_in_8bit` with bnb config by younesbelkada in 31136
* Workflow: Remove `IS_GITHUB_CI` by younesbelkada in 31147
* helper by ArthurZucker in 31152
* pytest -rsfE by ydshieh in 31140
* Fix quantized cache output by SunMarc in 31143
* Update sam.md by asifajrof in 31130
* Quantization: Enhance bnb error message by younesbelkada in 31160
* [trainer] add sanity evaluation option by SunMarc in 31146
* Add streaming, various fixes by aymeric-roucher in 30838
* Added description of quantization_config by vamsivallepu in 31133
* Fix typo: use_safetenstors to use_safetensors by CharlesCNorton in 31184
* Remove copied froms for deprecated models by amyeroberts in 31153
* Token healing by ahmed-moubtahij in 30081
* [`GemmaModel`] fix small typo by ArthurZucker in 31202
* Fix Cannot convert [array()] to EagerTensor of dtype int64 by pavi-ninjaac in 31109
* Ignore non-causal mask in more cases with SDPA by fxmarty in 30138
* SlidingWindowCache: reduce differences to other Cache classes by gante in 30970
* Fix `test_compile_static_cache` by ydshieh in 30991
* fix the get_size_with_aspect_ratio in max_size situation by SangbumChoi in 30902
* Fix typo in utils by Bojun-Feng in 31169
* Rename sanity_evaluation to eval_on_start by Qubitium in 31192
* Wrong translation FR : Contents = Contenu by jadechoghari in 31186
* Cohere: Fix copied from by younesbelkada in 31213
* Set greater_is_better to False if metric_for_best_model ends with "loss" by miivanov90 in 31142
* Fix GPU OOM for `mistral.py::Mask4DTestHard` by ydshieh in 31212
* [docs] Spanish translation of tokenizer_summary.md by aaronjimv in 31154
* Pass device in Logits Processor's init by zucchini-nlp in 29804
* Fix sentence fragment within test comments by DomHudson in 31218
* fix(PatchTST): Wrong dropout used for PretainHead by maxstrobel in 31117
* Video-LLaVa: handle any number of frames by zucchini-nlp in 31221
* Add dynamic resolution input/interpolate position embedding to deit by p-kris10 in 31131
* fix bf16 issue in text classification pipeline by chujiezheng in 30996
* Fix pipeline tests - torch imports by amyeroberts in 31227
* Add new line switch before logging ***** Running {description} ***** by jacklanda in 31225
* add no split modules for xlmrobertaxl by ManuelFay in 31223
* Fix `MistralIntegrationTest` by ydshieh in 31231
* Blip: Deprecate `BlipModel` by younesbelkada in 31235
* Move out common backbone config param validation by amyeroberts in 31144
* Upload (daily) CI results to Hub by ydshieh in 31168
* Specify dtype=torch.bool to avoid xla error by ysulsky in 31191
* Fixing `name 'torch' is not defined` in `bitsandbytes` integration by jamesbraza in 31243
* Benchmark GitHub Actions workflow by ydshieh in 31163
* Early labels validation by amyeroberts in 31240
* doc: add info about wav2vec2 bert in older wav2vec2 models. by Vaibhavs10 in 31120
* enable deterministic mode for npu by statelesshz in 31253
* Add missing Flaubert tokenizer tests by bastrob in 30492
* Fix circular reference issue in CLIPTokenizerFast by dhaivat1729 in 31075
* Add condition to `benchmark` job in `push-important-models.yml` by ydshieh in 31259
* Skip failing JetMOE generation tests by amyeroberts in 31266
* no need for explicit EXTRA_TOKENS in processing_paligemma.py by grahamannett in 31022
* [`SwitchTransformer`] Significant performance improvement on MoE blocks by ranggihwang in 31173
* fix loading special_tokens_map_file by ZhiyuanChen in 31012
* Make mamba use cache by zucchini-nlp in 31116
* Generation: fix handling of special tokens by zucchini-nlp in 31254
* Switch from `cached_download` to `hf_hub_download` in remaining occurrences by Wauplin in 31284
* fix: `str` should be used not `int` when setting env variables by statelesshz in 31272
* Fix _save_tpu: use _maybe_convert_to_cpu instead of to cpu. by baoleai in 31264
* fix accelerate tests for roberta xl by SunMarc in 31288
* Enable dynamic resolution input for Beit by OmarManzoor in 31053
* Mark MobileNetV1ModelTest::test_batching_equivalence as flaky by amyeroberts in 31258
* Pipeline VQA: Add support for list of images and questions as pipeline input by BlacCod in 31217
* Fix SwinLayer / DonutSwinLayer / ClapAudioLayer attention mask device by gorodnitskiy in 31295
* Update text-to-speech.md by jaguaryang in 31269
* Fixed Wav2Vec2ProcessorWithLM decoding error by karicotiza in 31188
* Fix jetmoe model by Cyrilvallez in 31279
* Extend save_pretrained to offloaded models by blbadger in 27412
* Implement JSON dump conversion for torch_dtype in TrainingArguments by junrae6454 in 31224
* interpolation added for TVP. by bhuvanmdev in 30863
* Rename test_model_common_attributes -> test_model_get_set_embeddings by amyeroberts in 31321
* Use unused prepare_img() function in dinov2 conversion script by IbrahimAmin1 in 31335
* docs: fix style by imba-tjd in 31340
* Fix paligemma inverted mask by molbap in 31207
* docs/zh: fix style by imba-tjd in 31334
* Decorators for deprecation and named arguments validation by qubvel in 30799
* Improve error msg when using bitsandbytes by SunMarc in 31350
* Fix Cohere CI by ydshieh in 31263
* Fix gradio tool demos by aymeric-roucher in 31230
* Fast image processor by amyeroberts in 28847
* Add french translation of AutoBackbone by jadechoghari in 31300
* Add support to declare imports for code agent by JasonZhu1313 in 31355
* Fix idefics cache by zucchini-nlp in 31377
* [Bug Fix] Renamed loss to losses to suppress UnboundLocalError by her0e1c1 in 31365
* docs: fix broken link by imba-tjd in 31370
* backbone_utils - fix relative import by amyeroberts in 31382
* README underline between badges fix by novialriptide in 31376
* Update comment in modeling_utils.py by inf3rnus in 31299
* Use huggingface_hub helper function to split state dict by SunMarc in 31091
* Change JSON serialization to custom json.dumps by junrae6454 in 31100
* feat(ci): add trufflehog secrets detection by McPatate in 31344
* [QoL fix] [Image processing] Add warning on assumption of channel dim and avoid infering when inputs are PIL.Image by aliencaocao in 31364
* Make chat templates part of ProcessorMixin by Rocketknight1 in 30744
* add initial design for uniform processors + align model by molbap in 31197
* Add missing French translation of tutoriel_pipeline.md by jadechoghari in 31396
* Temporarily pin datasets upper version to fix CI by albertvillanova in 31407
* Support Clip QKV for MPT by akakakakakaa in 31307
* Pin datasets<2.20.0 for examples by amyeroberts in 31417
* Fix MusicGen SDPA by ylacombe in 31208
* Set seed for M4T retain grad test by ylacombe in 31419
* Fix SpeechT5 `decoder_attention_mask` shape by ylacombe in 28071
* Change potential `inputs_embeds` padding `logger.warning` to `logger.warning_once` by naimenz in 31411
* Remove duplicate image processor in auto map by amyeroberts in 31383
* Install the tensorflow example requirements in docker by amyeroberts in 31428
* Remove empty create_and_test_config_common_properties tests by amyeroberts in 31359
* xpu: support xpu backend from stock pytorch (>=2.4) by dvrogozh in 31238
* Musicgen special tokens in tensors by zucchini-nlp in 31420
* Fix Bark logits processors device misplacement by ylacombe in 31416
* Rename misnamed image processor test files by amyeroberts in 31430
* Generate: fix `tokenizer` being popped twice by gante in 31427
* [tests] make `TestDeepSpeedModelZoo` device-agnostic by faaany in 31402
* Support multiple validation datasets when `dataloader_persistent_workers=True` by bastienlc in 30627
* Pass datasets trust_remote_code by albertvillanova in 31406
* simple fix by tokenizer-decode in 31456
* Fix typing errors in `Qwen2ForTokenClassification` by kevinhu in 31440
* Agents: Improve python interpreter by aymeric-roucher in 31409
* Donut: fix `generate` call from local path by gante in 31470
* Make "tool_use" the default chat template key when tools are passed by Rocketknight1 in 31429
* Fix single letter stop strings by Rocketknight1 in 31448
* Update chat template docs and bump Jinja version by Rocketknight1 in 31455
* Improve `PreTrainedTokenizerFast` loading time when there are many added tokens by ydshieh in 31404
* Fix documentation typos by qgallouedec in 31476
* Give more useful `metric_for_best_model` errors by tomaarsen in 31450
* Update perf_train_gpu_many.md by remyleone in 31451
* [`GPT2`] Add SDPA support by vasqu in 31172
* Fix autocast incompatibility in RecurrentGemma by xplip in 30832
* Use self.config_tester.run_common_tests() by amyeroberts in 31431
* [tests] rename `test_config_object` to `test_ds_config_object` by faaany in 31403
* Docs / AQLM: Clarify `torch.compile` support for AQLM by younesbelkada in 31473
* Mamba: add generative tests by gante in 31478
* Update object_detection.md by jajupmochi in 31488
* Add docs on zeroshot image classification prompt templates by aliencaocao in 31343
* auto-detect device when no device is passed to pipeline by faaany in 31398
* Fix typo: pas_token_id by ftnext in 30894
* Fix `wandb` integration with `SetFit` model by timothepearce in 30021
* Consider inheritance in type checking for tensors by daemyung in 31378
* Add valid columns check in _remove_unused_columns method by arthasking123 in 31466
* Fix a teeny-tiny typo in `tokenization_utils_base.py`'s docstring by sadra-barikbin in 31510
* Fix mismatched ` in doc & other common typos by jhwei in 31516
* RWKV: enable generation tests by gante in 31490
* unskip 2 tests in cohere by ydshieh in 31517
* Revive Nightly/Past CI by ydshieh in 31159
* Deprecate legacy cache + use cache position by zucchini-nlp in 31491
* SPLIT PR: add user defined symbols and control symbols by itazap in 31305
* Removed torch.cuda.empty_cache from train loop. by FoamoftheSea in 31530
* Update mask_generation.md by nicholicaron in 31543
* Correct is_flaky test decoration by qubvel in 31480
* Add implementation of `spectrogram_batch` by ravenouse in 27159
* chore: fix typos by xiaoxianBoy in 31559
* Update git templates by ArthurZucker in 31539
* Fix the error caused by incorrect use of logger in pipeline by lanyun1103 in 31565
* Fix bug about add_special_tokens and so on by hiroshi-matsuda-rit in 31496
* Add Jinja as a requirement with the right version cutoff by Rocketknight1 in 31536
* Fix doc typo in `TrainingArguments` by qgallouedec in 31503
* Fix is_torch_xpu_available for torch < 2.3 by amyeroberts in 31573
* Added version constraint on numpy for version <2.0 by Resteklicken in 31569
* Siglip: add `_no_split_module` by zucchini-nlp in 31566
* fix output data type of image classification by jiqing-feng in 31444
* add preprocessing_num_workers to run_classification.py by jiahuanluo in 31586
* Improve error message for mismatched copies in code blocks by molbap in 31535
* Add ViTImageProcessorFast to tests by amyeroberts in 31424
* docs: move translations to `i18n` by SauravMaheshkar in 31584
* Removed unnecessary `self.projection` call in `VivitTubeletEmbeddings` by v-iashin in 31632
* [`GPT-NeoX`] Add SDPA support by vasqu in 31031
* Update RT-DETR code snippet by qubvel in 31631
* Llama et al. / FSDP : Fix breaking change in 4.40 for FSDP by younesbelkada in 31161
* Fix RT-DETR inference with float16 and bfloat16 by qubvel in 31639
* Fix paligemma detection inference by molbap in 31587
* Generate: fix assisted generation with `past_key_values` passed as kwargs by gante in 31644
* Fix dtype casting in swinv2 and swinv2sr to allow non-FP32 inference by aliencaocao in 31589
* Skip tests properly by amyeroberts in 31308
* Generation: past kv can be None by zucchini-nlp in 31051
* Fix ONNX exports for Optimum compatible models by merveenoyan in 31311
Significant community contributions
The following contributors have made significant changes to the library over the last release:
* josephenguehard
* Add TokenClassification for Mistral, Mixtral and Qwen2 (29878)
* vasqu
* Fix a shape annotation and typos in `mamba` slow forward (30691)
* [`GPT2`] Add SDPA support (31172)
* [`GPT-NeoX`] Add SDPA support (31031)
* ariG23498
* [Port] TensorFlow implementation of Mistral (29708)
* bhuvanmdev
* added interpolation for vitmae model in pytorch as well as tf. (30732)
* interpolation added for TVP. (30863)
* SangbumChoi
* fix the get_size_with_aspect_ratio in max_size situation (30902)
* New model support RTDETR (29077)
* Cyrilvallez
* Reduce by 2 the memory requirement in `generate()` 🔥🔥🔥 (30536)
* Fix jetmoe model (31279)
* ravenouse
* Add implementation of `spectrogram_batch` (27159)
```
### 4.41.2
```
Mostly fixing some stuff related to `trust_remote_code=True` and `from_pretrained`
The `local_file_only` was having a hard time when a `.safetensors` file did not exist. This is not expected and instead of trying to convert, we should just fallback to loading the `.bin` files.
* Do not trigger autoconversion if local_files_only 31004 from Wauplin fixes this!
* Paligemma: Fix devices and dtype assignments (31008) by molbap
* Redirect transformers_agents doc to agents (31054) aymeric-roucher
* Fix from_pretrained in offline mode when model is preloaded in cache (31010) by oOraph
* Fix faulty rstrip in module loading (31108) Rocketknight1
```
### 4.41.1
```
Fix PaliGemma finetuning:
The causal mask and label creation was causing label leaks when training. Kudos to probicheaux for finding and reporting!
- https://github.com/huggingface/transformers/commit/a755745546779ae5c42510bc02a859bdac82b3b7 : PaliGemma - fix processor with no input text (https://github.com/huggingface/transformers/pull/30916) hiyouga
- https://github.com/huggingface/transformers/commit/a25f7d3c12975fe21eab437dda7363e9024de7c0 : Paligemma causal attention mask (https://github.com/huggingface/transformers/pull/30967) molbap and probicheaux
Other fixes:
- https://github.com/huggingface/transformers/commit/bb48e921868ac750417956de941606f7e2fa02ca: tokenizer_class = "AutoTokenizer" Llava Family (https://github.com/huggingface/transformers/pull/30912)
- https://github.com/huggingface/transformers/commit/1d568dfab262f76079eb4f3d05b606d51a0c9e4b : legacy to init the slow tokenizer when converting from slow was wrong (https://github.com/huggingface/transformers/pull/30972)
- https://github.com/huggingface/transformers/commit/b1065aa08ac0da11fcb9e3827cd7eafabe4beebd : Generation: get special tokens from model config (https://github.com/huggingface/transformers/pull/30899) zucchini-nlp
Reverted https://github.com/huggingface/transformers/commit/4ab7a28216211571fdddba414d4edd8426ab6489
```
### 4.41.0
```
New models
Phi3
The Phi-3 model was proposed in [Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone](https://arxiv.org/abs/2404.14219) by Microsoft.
TLDR; Phi-3 introduces new ROPE scaling methods, which seems to scale fairly well! A 3b and a
Phi-3-mini is available in two context-length variants—4K and 128K tokens. It is the first model in its class to support a context window of up to 128K tokens, with little impact on quality.
<img width="1599" alt="image" src="https://github.com/huggingface/transformers/assets/48595927/0f37c6b0-b118-453c-ac64-6e45aa291d0a">
* Phi-3 by gugarosa in https://github.com/huggingface/transformers/pull/30423
JetMoE
JetMoe-8B is an 8B Mixture-of-Experts (MoE) language model developed by [Yikang Shen](https://scholar.google.com.hk/citations?user=qff5rRYAAAAJ) and [MyShell](https://myshell.ai/). JetMoe project aims to provide a LLaMA2-level performance and efficient language model with a limited budget. To achieve this goal, JetMoe uses a sparsely activated architecture inspired by the [ModuleFormer](https://arxiv.org/abs/2306.04640). Each JetMoe block consists of two MoE layers: Mixture of Attention Heads and Mixture of MLP Experts. Given the input tokens, it activates a subset of its experts to process them. This sparse activation schema enables JetMoe to achieve much better training throughput than similar size dense models. The training throughput of JetMoe-8B is around 100B tokens per day on a cluster of 96 H100 GPUs with a straightforward 3-way pipeline parallelism strategy.
<img width="1559" alt="image" src="https://github.com/huggingface/transformers/assets/48595927/cc83ce99-7a61-4d04-a234-3f68e6c0fafd">
* Add JetMoE model by yikangshen in https://github.com/huggingface/transformers/pull/30005
PaliGemma
PaliGemma is a lightweight open vision-language model (VLM) inspired by [PaLI-3](https://arxiv.org/abs/2310.09199), and based on open components like the [SigLIP vision model](https://arxiv.org/abs/2303.15343) and the [Gemma language model](https://arxiv.org/abs/2403.08295). PaliGemma takes both images and text as inputs and can answer questions about images with detail and context, meaning that PaliGemma can perform deeper analysis of images and provide useful insights, such as captioning for images and short videos, object detection, and reading text embedded within images.
More than 120 checkpoints are released see the collection [here](https://huggingface.co/collections/google/paligemma-release-6643a9ffbf57de2ae0448dda) !
<img width="1064" alt="image" src="https://github.com/huggingface/transformers/assets/48595927/23584b9a-6c36-46f5-8700-32f402c0f674">
* Add PaliGemma by molbap in https://github.com/huggingface/transformers/pull/30814
VideoLlava
Video-LLaVA exhibits remarkable interactive capabilities between images and videos, despite the absence of image-video pairs in the dataset.
💡 Simple baseline, learning united visual representation by alignment before projection
With the binding of unified visual representations to the language feature space, we enable an LLM to perform visual reasoning capabilities on both images and videos simultaneously.
🔥 High performance, complementary learning with video and image
Extensive experiments demonstrate the complementarity of modalities, showcasing significant superiority when compared to models specifically designed for either images or videos.
<img width="532" alt="image" src="https://cdn-uploads.huggingface.co/production/uploads/62441d1d9fdefb55a0b7d12c/cLniWc__KECBBesliHKhd.png">
* Add Video Llava by zucchini-nlp in https://github.com/huggingface/transformers/pull/29733
Falcon 2 and FalconVLM:
<img width="1024" alt="image" src="https://falconllm.tii.ae/assets/images/table-1___.jpeg">
Two new models from TII-UAE! They published a [blog-post](https://falconllm.tii.ae/falcon-2.html) with more details! Falcon2 introduces parallel mlp, and falcon VLM uses the `Llava` framework
* Support for Falcon2-11B by Nilabhra in https://github.com/huggingface/transformers/pull/30771
* Support arbitrary processor by ArthurZucker in https://github.com/huggingface/transformers/pull/30875
GGUF `from_pretrained` support
<img width="1064" alt="image" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/gguf-spec.png">
You can now load most of the GGUF quants directly with transformers' `from_pretrained` to convert it to a classic pytorch model. The API is simple:
python
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF"
filename = "tinyllama-1.1b-chat-v1.0.Q6_K.gguf"
tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)
model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=filename)
We plan more closer integrations with llama.cpp / GGML ecosystem in the future, see: https://github.com/huggingface/transformers/issues/27712 for more details
* Loading GGUF files support by LysandreJik in https://github.com/huggingface/transformers/pull/30391
```
### 4.40.2
```
Fix torch fx for LLama model
- Fix for Neuron (30259)
- Fix copies for DBRX - neuron fix (30610)
Thanks michaelbenayoun !
```
### 4.40.1
```
Kudos to pcuenca for the prompt fix in:
- Make EosTokenCriteria compatible with mps 30376
To support `EosTokenCriteria` on MPS while `pytorch` adds this functionality.
```
### 4.40.0
```
New model additions
Llama 3
Llama 3 is supported in this release through the Llama 2 architecture and some fixes in the `tokenizers` library.
Idefics2
<img src="https://huggingface.co/HuggingFaceM4/idefics-80b/resolve/main/assets/IDEFICS.png"
alt="drawing" width="300"/>
The Idefics2 model was created by the Hugging Face M4 team and authored by Léo Tronchon, Hugo Laurencon, Victor Sanh. The accompanying blog post can be found here.
Idefics2 is an open multimodal model that accepts arbitrary sequences of image and text inputs and produces text outputs. The model can answer questions about images, describe visual content, create stories grounded on multiple images, or simply behave as a pure language model without visual inputs. It improves upon IDEFICS-1, notably on document understanding, OCR, or visual reasoning. Idefics2 is lightweight (8 billion parameters) and treats images in their native aspect ratio and resolution, which allows for varying inference efficiency.
* Add Idefics2 by amyeroberts in 30253
Recurrent Gemma
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/recurrent-gemma.png"
alt="drawing" width="600"/>
<small> Recurrent Gemma architecture. Taken from the <a href="https://arxiv.org/pdf/2402.19427.pdf">original paper.</a> </small>
The Recurrent Gemma model was proposed in RecurrentGemma: Moving Past Transformers for Efficient Open Language Models by the Griffin, RLHF and Gemma Teams of Google.
The abstract from the paper is the following:
We introduce RecurrentGemma, an open language model which uses Google’s novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide a pre-trained model with 2B non-embedding parameters, and an instruction tuned variant. Both models achieve comparable performance to Gemma-2B despite being trained on fewer tokens.
* Add recurrent gemma by ArthurZucker in 30143
Jamba
Jamba is a pretrained, mixture-of-experts (MoE) generative text model, with 12B active parameters and an overall of 52B parameters across all experts. It supports a 256K context length, and can fit up to 140K tokens on a single 80GB GPU.
As depicted in the diagram below, Jamba’s architecture features a blocks-and-layers approach that allows Jamba to successfully integrate Transformer and Mamba architectures altogether. Each Jamba block contains either an attention or a Mamba layer, followed by a multi-layer perceptron (MLP), producing an overall ratio of one Transformer layer out of every eight total layers.
![image](https://github.com/huggingface/transformers/assets/48595927/d78bb917-7a8a-4959-8206-e493c6c75f3d)
Jamba introduces the first `HybridCache` object that allows it to natively support assisted generation, contrastive search, speculative decoding, beam search and all of the awesome features from the `generate` API!
* Add jamba by tomeras91 in 29943
DBRX
DBRX is a [transformer-based](https://www.isattentionallyouneed.com/) decoder-only large language model (LLM) that was trained using next-token prediction. It uses a *fine-grained* mixture-of-experts (MoE) architecture with 132B total parameters of which 36B parameters are active on any input.
It was pre-trained on 12T tokens of text and code data. Compared to other open MoE models like Mixtral-8x7B and Grok-1, DBRX is fine-grained, meaning it uses a larger number of smaller experts. DBRX has 16 experts and chooses 4, while Mixtral-8x7B and Grok-1 have 8 experts and choose 2.
This provides 65x more possible combinations of experts and the authors found that this improves model quality. DBRX uses rotary position encodings (RoPE), gated linear units (GLU), and grouped query attention (GQA).
* Add DBRX Model by abhi-mosaic in 29921
OLMo
The OLMo model was proposed in OLMo: Accelerating the Science of Language Models by Dirk Groeneveld, Iz Beltagy, Pete Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Harsh Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson, Russell Authur, Khyathi Raghavi Chandu, Arman Cohan, Jennifer Dumas, Yanai Elazar, Yuling Gu, Jack Hessel, Tushar Khot, William Merrill, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Valentina Pyatkin, Abhilasha Ravichander, Dustin Schwenk, Saurabh Shah, Will Smith, Emma Strubell, Nishant Subramani, Mitchell Wortsman, Pradeep Dasigi, Nathan Lambert, Kyle Richardson, Luke Zettlemoyer, Jesse Dodge, Kyle Lo, Luca Soldaini, Noah A. Smith, Hannaneh Hajishirzi.
OLMo is a series of Open Language Models designed to enable the science of language models. The OLMo models are trained on the Dolma dataset. We release all code, checkpoints, logs (coming soon), and details involved in training these models.
* Add OLMo model family by 2015aroras in 29890
Qwen2MoE
Qwen2MoE is the new model series of large language models from the Qwen team. Previously, we released the Qwen series, including Qwen-72B, Qwen-1.8B, Qwen-VL, Qwen-Audio, etc.
Model Details
Qwen2MoE is a language model series including decoder language models of different model sizes. For each size, we release the base language model and the aligned chat model. Qwen2MoE has the following architectural choices:
Qwen2MoE is based on the Transformer architecture with SwiGLU activation, attention QKV bias, group query attention, mixture of sliding window attention and full attention, etc. Additionally, we have an improved tokenizer adaptive to multiple natural languages and codes.
Qwen2MoE employs Mixture of Experts (MoE) architecture, where the models are upcycled from dense language models. For instance, Qwen1.5-MoE-A2.7B is upcycled from Qwen-1.8B. It has 14.3B parameters in total and 2.7B activated parameters during runtime, while it achieves comparable performance with Qwen1.5-7B, with only 25% of the training resources.
* Add Qwen2MoE by bozheng-hit in 29377
Grounding Dino
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/grouding_dino_architecture.png"
alt="drawing" width="600"/>
<small> Taken from the <a href="https://arxiv.org/pdf/2303.05499.pdf">original paper.</a> </small>
The Grounding DINO model was proposed in Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection by Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang. Grounding DINO extends a closed-set object detection model with a text encoder, enabling open-set object detection. The model achieves remarkable results, such as 52.5 AP on COCO zero-shot.
* Adding grounding dino by EduardoPach in 26087
Static pretrained maps
Static pretrained maps have been removed from the library's internals and are currently deprecated. These used to reflect all the available checkpoints for a given architecture on the Hugging Face Hub, but their presence does not make sense in light of the huge growth of checkpoint shared by the community.
With the objective of lowering the bar of model contributions and reviewing, we first start by removing legacy objects such as this one which do not serve a purpose.
* Remove static pretrained maps from the library's internals by LysandreJik in 29112
Notable improvements
Processors improvements
Processors are ungoing changes in order to uniformize them and make them clearer to use.
* Separate out kwargs in processor by amyeroberts in 30193
* [Processor classes] Update docs by NielsRogge in 29698
SDPA
* re-introduced the fast path for sdpa by fxmarty in 30070
Push to Hub for pipelines
Pipelines can now be pushed to Hub using a convenient `push_to_hub` method.
* add `push_to_hub` to pipeline by not-lain in 29172
Flash Attention 2 for more models (M2M100, NLLB, GPT2, MusicGen) !
Thanks to the community contribution, Flash Attention 2 has been integrated for more architectures
* Adding Flash Attention 2 Support for GPT2 by EduardoPach in 29226
* Add Flash Attention 2 support to Musicgen and Musicgen Melody by ylacombe in 29939
* Add Flash Attention 2 to M2M100 model by visheratin in 30256
Improvements and bugfixes
* [docs] Remove redundant `-` and `the` from custom_tools.md by windsonsea in 29767
* Fixed typo in quantization_config.py by kurokiasahi222 in 29766
* OWL-ViT box_predictor inefficiency issue by RVV-karma in 29712
* Allow `-OO` mode for `docstring_decorator` by matthid in 29689
* fix issue with logit processor during beam search in Flax by giganttheo in 29636
* Fix docker image build for `Latest PyTorch + TensorFlow [dev]` by ydshieh in 29764
* [`LlavaNext`] Fix llava next unsafe imports by ArthurZucker in 29773
* Cast bfloat16 to float32 for Numpy conversions by Rocketknight1 in 29755
* Silence deprecations and use the DataLoaderConfig by muellerzr in 29779
* Add deterministic config to `set_seed` by muellerzr in 29778
* Add support for `torch_dtype` in the run_mlm example by jla524 in 29776
* Generate: remove legacy generation mixin imports by gante in 29782
* Llama: always convert the causal mask in the SDPA code path by gante in 29663
* Prepend `bos token` to Blip generations by zucchini-nlp in 29642
* Change in-place operations to out-of-place in LogitsProcessors by zucchini-nlp in 29680
* [`quality`] update quality check to make sure we check imports 😈 by ArthurZucker in 29771
* Fix type hint for train_dataset param of Trainer.__init__() to allow IterableDataset. Issue 29678 by stevemadere in 29738
* Enable AMD docker build CI by IlyasMoutawwakil in 29803
* Correct llava mask & fix missing setter for `vocab_size` by fxmarty in 29389
* rm input dtype change in CPU by jiqing-feng in 28631
* Generate: remove unused attributes in `AssistedCandidateGenerator` by gante in 29787
* replaced concatenation to f-strings to improve readability and unify … by igeni in 29785
* [`cleanup`] vestiges of causal mask by ArthurZucker in 29806
* Complete security policy with mentions of remote code by LysandreJik in 29707
* [`SuperPoint`] Fix doc example by amyeroberts in 29816
* [DOCS] Fix typo for llava next docs by aliencaocao in 29829
* model_summary.md - Restore link to Harvard's Annotated Transformer. by gamepad-coder in 29702
* Fix the behavior of collecting 'num_input_tokens_seen' by YouliangHUANG in 29099
* Populate torch_dtype from model to pipeline by B-Step62 in 28940
* remove quotes in code example by johko in 29812
* Add warnings if training args differ from checkpoint trainer state by jonflynng in 29255
* Replace 'decord' with 'av' in VideoClassificationPipeline by Tyx-main in 29747
* Fix header in IFE task guide by merveenoyan in 29859
* [docs] Indent ordered list in add_new_model.md by windsonsea in 29796
* Allow `bos_token_id is None` during the generation with `inputs_embeds` by LZHgrla in 29772
* Add `cosine_with_min_lr` scheduler in Trainer by liuyanyi in 29341
* Disable AMD memory benchmarks by IlyasMoutawwakil in 29871
* Set custom_container in build docs workflows by Wauplin in 29855
* Support `num_attention_heads` != `num_key_value_heads` in Flax Llama Implementation by bminixhofer in 29557
* Mamba `slow_forward` gradient fix by vasqu in 29563
* Fix 29807, sinusoidal positional encodings overwritten by post_init() by hovnatan in 29813
* Reimplement "Automatic safetensors conversion when lacking these files" by LysandreJik in 29846
* fix fuyu device_map compatibility by SunMarc in 29880
* Move `eos_token_id` to stopping criteria by zucchini-nlp in 29459
* add Cambricon MLUs support by huismiling in 29627
* MixtralSparseMoeBlock: add gate jitter by lorenzoverardo in 29865
* Fix typo in T5Block error message by Mingosnake in 29881
* [`make fix-copies`] update and help by ArthurZucker in 29924
* [`GptNeox`] don't gather on pkv when using the trainer by ArthurZucker in 29892
* [`pipeline`]. Zero shot add doc warning by ArthurZucker in 29845
* [doc] fix some typos and add `xpu` to the testing documentation by faaany in 29894
* Tests: replace `torch.testing.assert_allclose` by `torch.testing.assert_close` by gante in 29915
* Add beam search visualizer to the doc by aymeric-roucher in 29876
* Safe import of LRScheduler by amyeroberts in 29919
* add functions to inspect model and optimizer status to trainer.py by CKeibel in 29838
* RoPE models: add numerical sanity-check test for RoPE scaling by gante in 29808
* [`Mamba`] from pretrained issue with `self.embeddings` by ArthurZucker in 29851
* [ `TokenizationLlama`] fix the way we convert tokens to strings to keep leading spaces 🚨 breaking fix by ArthurZucker in 29453
* Allow GradientAccumulationPlugin to be configured from AcceleratorConfig by fabianlim in 29589
* [`BC`] Fix BC for other libraries by ArthurZucker in 29934
* Fix doc issue 29758 in DebertaV2Config class by vinayakkgarg in 29842
* [`LlamaSlowConverter`] Slow to Fast better support by ArthurZucker in 29797
* Update installs in image classification doc by MariaHei in 29947
* [`StableLm`] Add QK normalization and Parallel Residual Support by jon-tow in 29745
* Mark `test_eager_matches_sdpa_generate` flaky for some models by ydshieh in 29479
* Super tiny fix 12 typos about "with with" by fzyzcjy in 29926
* Fix rope theta for OpenLlama by jla524 in 29893
* Add warning message for `run_qa.py` by jla524 in 29867
* fix: get mlflow version from mlflow-skinny by clumsy in 29918
* Reset alarm signal when the function is ended by coldnight in 29706
* Update model card and link of blog post. by bozheng-hit in 29928
* [`BC`] Fix BC for AWQ quant by TechxGenus in 29965
* Rework tests to compare trainer checkpoint args by muellerzr in 29883
* Fix FA2 tests by ylacombe in 29909
* Fix copies main ci by ArthurZucker in 29979
* [tests] fix the wrong output in `ImageToTextPipelineTests.test_conditional_generation_llava` by faaany in 29975
* Generate: move misplaced test by gante in 29902
* [docs] Big model loading by stevhliu in 29920
* [`generate`] fix breaking change for patch by ArthurZucker in 29976
* Fix 29807 sinusoidal positional encodings in Flaubert, Informer and XLM by hovnatan in 29904
* [bnb] Fix bug in `_replace_with_bnb_linear` by SunMarc in 29958
* Adding FlaxNoRepeatNGramLogitsProcessor by giganttheo in 29677
* [Docs] Make an ordered list prettier in add_tensorflow_model.md by windsonsea in 29949
* Fix `skip_special_tokens` for `Wav2Vec2CTCTokenizer._decode` by msublee in 29311
* Hard error when ignoring tensors. by Narsil in 27484)
* Generate: fix logits processors doctests by gante in 29718
* Fix `remove_columns` in `text-classification` example by mariosasko in 29351
* Update `tests/utils/tiny_model_summary.json` by ydshieh in 29941
* Make EncodecModel.decode ONNX exportable by fxmarty in 29913
* Fix Swinv2ForImageClassification NaN output by miguelm-almeida in 29981
* Fix Qwen2Tokenizer by jklj077 in 29929
* Fix `kwargs` handling in `generate_with_fallback` by cifkao in 29225
* Fix probability computation in `WhisperNoSpeechDetection` when recomputing scores by cifkao in 29248
* Fix vipllava for generation by zucchini-nlp in 29874
* [docs] Fix audio file by stevhliu in 30006
* Superpoint imports fix by zucchini-nlp in 29898
* [`Main CIs`] Fix the red cis by ArthurZucker in 30022
* Make clearer about zero_init requirements by muellerzr in 29879
* Enable multi-device for efficientnet by jla524 in 29989
* Add a converter from mamba_ssm -> huggingface mamba by byi8220 in 29705
* [`ProcessingIdefics`] Attention mask bug with padding by byi8220 in 29449
* Add `whisper` to `IMPORTANT_MODELS` by ydshieh in 30046
* skip `test_encode_decode_fast_slow_all_tokens` for now by ydshieh in 30044
* if output is tuple like facebook/hf-seamless-m4t-medium, waveform is … by sywangyi in 29722
* Fix mixtral ONNX Exporter Issue. by AdamLouly in 29858
* [Trainer] Allow passing image processor by NielsRogge in 29896
* [bnb] Fix offload test by SunMarc in 30039
* Update quantizer_bnb_4bit.py: In the ValueError string there should be "....you need to set `llm_int8_enable_fp32_cpu_offload=True`...." instead of "`load_in_8bit_fp32_cpu_offload=True`". by miRx923 in 30013
* [test fetcher] Always include the directly related test files by ydshieh in 30050
* Fix `torch.fx` symbolic tracing for LLama by michaelbenayoun in 30047
* Refactor daily CI workflow by ydshieh in 30012
* Add docstrings and types for MambaCache by koayon in 30023
* Fix auto tests by ydshieh in 30067
* Fix whisper kwargs and generation config by zucchini-nlp in 30018
* doc: Correct spelling mistake by caiyili in 30107
* [Whisper] Computing features on GPU in batch mode for whisper feature extractor. by vaibhavagg303 in 29900
* Change log level to warning for num_train_epochs override by xu-song in 30014
* Make MLFlow version detection more robust and handles mlflow-skinny by helloworld1 in 29957
* updated examples/pytorch/language-modeling scripts and requirements.txt to require datasets>=2.14.0 by Patchwork53 in 30120
* [tests] add `require_bitsandbytes` marker by faaany in 30116
* fixing issue 30034 - adding data format for run_ner.py by JINO-ROHIT in 30088
* Patch fix - don't use safetensors for TF models by amyeroberts in 30118
* [29174] ImportError Fix: Trainer with PyTorch requires accelerate>=0.20.1 Fix by UtkarshaGupte in 29888
* Accept token in trainer.push_to_hub() by mapmeld in 30093
* fix learning rate display in trainer when using galore optimizer by vasqu in 30085
* Fix falcon with SDPA, alibi but no passed mask by fxmarty in 30123
* Trainer / Core : Do not change init signature order by younesbelkada in 30126
* Make vitdet jit trace complient by fxmarty in 30065
* Fix typo at ImportError by DrAnaximandre in 30090
* Adding `mps` as device for `Pipeline` class by fnhirwa in 30080
* Fix failing DeepSpeed model zoo tests by pacman100 in 30112
* Add datasets.Dataset to Trainer's train_dataset and eval_dataset type hints by ringohoffman in 30077
* Fix docs Pop2Piano by zucchini-nlp in 30140
* Revert workaround for TF safetensors loading by Rocketknight1 in 30128
* [Trainer] Fix default data collator by NielsRogge in 30142
* [Trainer] Undo 29896 by NielsRogge in 30129
* Fix slow tests for important models to be compatible with A10 runners by ydshieh in 29905
* Send headers when converting safetensors by ydshieh in 30144
* Fix quantization tests by SunMarc in 29914
* [docs] Fix image segmentation guide by stevhliu in 30132
* [CI] Fix setup by SunMarc in 30147
* Fix length related warnings in speculative decoding by zucchini-nlp in 29585
* Fix and simplify semantic-segmentation example by qubvel in 30145
* [CI] Quantization workflow fix by SunMarc in 30158
* [tests] make 2 tests device-agnostic by faaany in 30008
* Add str to TrainingArguments report_to type hint by ringohoffman in 30078
* [UDOP] Fix tests by NielsRogge in 29573
* [UDOP] Improve docs, add resources by NielsRogge in 29571
* Fix accelerate kwargs for versions <0.28.0 by vasqu in 30086
* Fix typing annotation in hf_argparser by xu-song in 30156
* Fixing a bug when MlFlow try to log a torch.tensor by etiennebonnafoux in 29932
* Fix natten install in docker by ydshieh in 30161
* FIX / bnb: fix torch compatiblity issue with `itemize` by younesbelkada in 30162
* Update config class check in auto factory by Rocketknight1 in 29854
* Fixed typo in comments/documentation for Pipelines documentation by DamonGuzman in 30170
* Fix Llava chat template examples by lewtun in 30130
* Guard XLA version imports by muellerzr in 30167
* chore: remove repetitive words by hugehope in 30174
* fix: Fixed `ruff` configuration to avoid deprecated configuration warning by Sai-Suraj-27 in 30179
* Refactor Cohere Model by saurabhdash2512 in 30027
* Update output of SuperPointForKeypointDetection by NielsRogge in 29809
* Falcon: make activation, ffn_hidden_size configurable by sshleifer in 30134
* Docs PR template by stevhliu in 30171
* ENH: [`CI`] Add new workflow to run slow tests of important models on push main if they are modified by younesbelkada in 29235
* Fix pipeline logger.warning_once bug by amyeroberts in 30195
* fix: Replaced deprecated `logger.warn` with `logger.warning` by Sai-Suraj-27 in 30197
* fix typo by mdeff in 30220
* fix fuyu doctest by molbap in 30215
* Fix `RecurrentGemmaIntegrationTest.test_2b_sample` by ydshieh in 30222
* Update modeling_bark.py by bes-dev in 30221
* Fix/Update for doctest by ydshieh in 30216
* Fixed config.json download to go to user-supplied cache directory by ulatekh in 30189
* Add test for parse_json_file and change typing to os.PathLike by xu-song in 30183
* fix: Replace deprecated `assertEquals` with `assertEqual` by Sai-Suraj-27 in 30241
* Set pad_token in run_glue_no_trainer.py 28534 by JINO-ROHIT in 30234
* fix: Replaced deprecated `typing.Text` with `str` by Sai-Suraj-27 in 30230
* Refactor doctest by ydshieh in 30210
* fix: Fixed `type annotation` for compatability with python 3.8 by Sai-Suraj-27 in 30243
* Fix doctest more (for `docs/source/en`) by ydshieh in 30247
* round epoch only in console by xdedss in 30237
* update github actions packages' version to suppress warnings by ydshieh in 30249
* [tests] add the missing `require_torch_multi_gpu` flag by faaany in 30250
* [Docs] Update recurrent_gemma.md for some minor nits by sayakpaul in 30238
* Remove incorrect arg in codellama doctest by Rocketknight1 in 30257
* Update `ko/_toctree.yml` by jungnerd in 30062
* More fixes for doctest by ydshieh in 30265
* FIX: Fix corner-case issue with the important models workflow by younesbelkada in 30212
* FIX: Fix 8-bit serialization tests by younesbelkada in 30051
* Allow for str versions of dicts based on typing by muellerzr in 30227
* Workflow: Update tailscale to release version by younesbelkada in 30268
* Raise relevent err when wrong type is passed in as the accelerator_config by muellerzr in 29997
* BLIP - fix pt-tf equivalence test by amyeroberts in 30258
* fix: Fixed a `raise` statement by Sai-Suraj-27 in 30275
* Fix test fetcher (doctest) + `Idefics2`'s doc example by ydshieh in 30274
* Fix SDPA sliding window compatibility by fxmarty in 30127
* Fix SpeechT5 forward docstrings by ylacombe in 30287
* FIX / AWQ: Fix failing exllama test by younesbelkada in 30288
* Configuring Translation Pipelines documents update 27753 by UtkarshaGupte in 29986
* Enable fx tracing for Mistral by zucchini-nlp in 30209
* Fix test `ExamplesTests::test_run_translation` by ydshieh in 30281
* Fix `Fatal Python error: Bus error` in `ZeroShotAudioClassificationPipelineTests` by ydshieh in 30283
* FIX: Fix push important models CI by younesbelkada in 30291
* Add token type ids to CodeGenTokenizer by st81 in 29265
* Add strategy to store results in evaluation loop by qubvel in 30267
* Upgrading to tokenizers 0.19.0 by Narsil in 30289
* Re-enable SDPA's FA2 path by fxmarty in 30070
* Fix quality Olmo + SDPA by fxmarty in 30302
* Fix donut token2json multiline by qubvel in 30300
* Fix all torch pipeline failures except one by ydshieh in 30290
* Add atol for sliding window test by fxmarty in 30303
* Fix RecurrentGemma device_map by SunMarc in 30273
* Revert "Re-enable SDPA's FA2 path by ArthurZucker in 30070)"
* Do not drop mask with SDPA for more cases by fxmarty in 30311
* FIX: Fixes unexpected behaviour for Llava / LLama & AWQ Fused modules + revert 30070 at the same time by younesbelkada in 30317
Significant community contributions
The following contributors have made significant changes to the library over the last release:
* bozheng-hit
* Add Qwen2MoE (29377)
* Update model card and link of blog post. (29928)
* EduardoPach
* Adding Flash Attention 2 Support for GPT2 (29226)
* Adding grounding dino (26087)
* 2015aroras
* Add OLMo model family (29890)
* tomeras91
* Add jamba (29943)
* abhi-mosaic
* Add DBRX Model (29921)
```
Links
- PyPI: https://pypi.org/project/transformers
- Changelog: https://data.safetycli.com/changelogs/transformers/
- Repo: https://github.com/huggingface/transformers
This PR updates transformers from 4.39.3 to 4.42.3.
Changelog
### 4.42.3 ``` Make sure we have attention softcapping for "eager" GEMMA2 model After experimenting, we noticed that for the 27b model mostly, softcapping is a must. So adding it back (it should have been there, but an error on my side made it disappear) sorry all! 😭 - Gemma capping is a must for big models (31698) ``` ### 4.42.2 ``` Patch release Thanks to our 2 contributors for their prompt fixing mostly applies for training and FA2! - Fix Gemma2 4d attention mask (31674) by hiyouga - don't zero out the attention_mask when using sliding window with flash attention (31670) by winglian ``` ### 4.42.1 ``` Patch release for commit: - [HybridCache] Fix get_seq_length method (31661) ``` ### 4.42.0 ``` New model additions Gemma-2 The Gemma2 model was proposed in [Gemma2: Open Models Based on Gemini Technology and Research](https://blog.google/technology/developers/Gemma2-open-models/) by Gemma2 Team, Google. Gemma2 models are trained on 6T tokens, and released with 2 versions, 2b and 7b. The abstract from the paper is the following: *This work introduces Gemma2, a new family of open language models demonstrating strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Gemma2 outperforms similarly sized open models on 11 out of 18 text-based tasks, and we present comprehensive evaluations of safety and responsibility aspects of the models, alongside a detailed description of our model development. We believe the responsible release of LLMs is critical for improving the safety of frontier models, and for enabling the next wave of LLM innovations* ![image](https://github.com/huggingface/transformers/assets/30755778/798b25f4-485a-4b60-abe5-af612def209b) * Add gemma 2 by ArthurZucker in 31659 RTDETR The RT-DETR model was proposed in [DETRs Beat YOLOs on Real-time Object Detection](https://arxiv.org/abs/2304.08069) by Wenyu Lv, Yian Zhao, Shangliang Xu, Jinman Wei, Guanzhong Wang, Cheng Cui, Yuning Du, Qingqing Dang, Yi Liu. RT-DETR is an object detection model that stands for “Real-Time DEtection Transformer.” This model is designed to perform object detection tasks with a focus on achieving real-time performance while maintaining high accuracy. Leveraging the transformer architecture, which has gained significant popularity in various fields of deep learning, RT-DETR processes images to identify and locate multiple objects within them. ![image](https://github.com/huggingface/transformers/assets/30755778/78b096d4-2686-41cb-9fdd-1cd517722fd3) * New model support RTDETR by SangbumChoi in 29077 InstructBlip The InstructBLIP model was proposed in [InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning](https://arxiv.org/abs/2305.06500) by Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven Hoi. InstructBLIP leverages the [BLIP-2](https://huggingface.co/docs/transformers/main/en/model_doc/blip2) architecture for visual instruction tuning. InstructBLIP uses the same architecture as [BLIP-2](https://huggingface.co/docs/transformers/main/en/model_doc/blip2) with a tiny but important difference: it also feeds the text prompt (instruction) to the Q-Former. ![image](https://github.com/huggingface/transformers/assets/30755778/fd6997aa-d299-4d14-9eab-c3f16309bae9) * Add video modality for InstrucBLIP by zucchini-nlp in 30182 LlaVa NeXT Video The LLaVa-NeXT-Video model was proposed in [LLaVA-NeXT: A Strong Zero-shot Video Understanding Model](https://llava-vl.github.io/blog/2024-04-30-llava-next-video/) by Yuanhan Zhang, Bo Li, Haotian Liu, Yong Jae Lee, Liangke Gui, Di Fu, Jiashi Feng, Ziwei Liu, Chunyuan Li. LLaVa-NeXT-Video improves upon [LLaVa-NeXT](https://huggingface.co/docs/transformers/main/en/model_doc/llava_next) by fine-tuning on a mix if video and image dataset thus increasing the model’s performance on videos. [LLaVA-NeXT](https://huggingface.co/docs/transformers/main/en/model_doc/llava_next) surprisingly has strong performance in understanding video content in zero-shot fashion with the AnyRes technique that it uses. The AnyRes technique naturally represents a high-resolution image into multiple images. This technique is naturally generalizable to represent videos because videos can be considered as a set of frames (similar to a set of images in LLaVa-NeXT). The current version of LLaVA-NeXT makes use of AnyRes and trains with supervised fine-tuning (SFT) on top of LLaVA-Next on video data to achieves better video understanding capabilities.The model is a current SOTA among open-source models on [VideoMME bench](https://arxiv.org/abs/2405.21075). * Add LLaVa NeXT Video by zucchini-nlp in 31252 New model adder A very significant change makes its way within the `transformers` codebase, introducing a new way to add models to `transformers`. We recommend reading the description of the PR below, but here is the gist of it: > The diff_converter tool is here to replace our old Copied from statements, while keeping our core transformers philosophy: > > - single model single file > - explicit code > - standardization of modeling code > - readable and educative code > - simple code > - least amount of modularity > > This additionally unlocks the ability to very quickly see the differences between new architectures that get developed. While many architectures are similar, the "single model, single file" policy can obfuscate the changes. With this diff converter, we want to make the changes between architectures very explicit. * Diff converter v2 by ArthurZucker in 30868 Tool-use and RAG model support We've made major updates to our support for tool-use and RAG models. We can now automatically generate JSON schema descriptions for Python functions which are suitable for passing to tool models, and we've defined a standard API for tool models which should allow the same tool inputs to be used with many different models. Models will need updates to their chat templates to support the new API, and we're targeting the **Nous-Hermes**, **Command-R** and **Mistral/Mixtral** model families for support in the very near future. Please see the updated [chat template docs](https://huggingface.co/docs/transformers/main/chat_templating) for more information. If you are the owner of a model that supports tool use, but you're not sure how to update its chat template to support the new API, feel free to reach out to us for assistance with the update, for example on the [Hugging Face Discord server](https://hf.co/join/discord). Ping Matt and yell key phrases like "chat templates" and "Jinja" and your issue will probably get resolved. * Chat Template support for function calling and RAG by Rocketknight1 in 30621 GGUF support We further the support of GGUF files to offer fine-tuning within the python/HF ecosystem, before converting them back to the GGUF/GGML/llama.cpp libraries. * Add Qwen2 GGUF loading support by Isotr0py in 31175 * GGUF: Fix llama 3 GGUF by younesbelkada in 31358 * Fix llama gguf converter by SunMarc in 31575 Trainer improvements A new optimizer is added in the `Trainer`. * FEAT / Trainer: LOMO optimizer support by younesbelkada in 30178 Quantization improvements Several improvements are done related to quantization: a new cache (the quantized KV cache) is added, offering the ability to convert the cache of generative models, further reducing the memory requirements. Additionally, the documentation related to quantization is entirely redone with the aim of helping users choose which is the best quantization method. * Quantized KV Cache by zucchini-nlp in 30483 * Docs / Quantization: refactor quantization documentation by younesbelkada in 30942 Examples New instance segmentation examples are added by qubvel * Instance segmentation examples by qubvel in 31084 Notable improvements As a notable improvement to the HF vision models that leverage backbones, we enable leveraging HF pretrained model weights as backbones, with the following API: py from transformers import MaskFormerConfig, MaskFormerForInstanceSegmentation config = MaskFormerConfig(backbone="microsoft/resnet-50", use_pretrained_backbone=True) model = MaskFormerForInstanceSegmentation(config) * Enable HF pretrained backbones by amyeroberts in 31145 Additionally, we thank Cyrilvallez for diving into our `generate` method and greatly reducing the memory requirements. * Reduce by 2 the memory requirement in `generate()` 🔥🔥🔥 by Cyrilvallez in 30536 Breaking changes Remove ConversationalPipeline and Conversation object Both the ConversationalPipeline and the Conversation object have been deprecated for a while, and are due for removal in 4.42, which is the upcoming version. The `TextGenerationPipeline` is recommended for this use-case, and now accepts inputs in the form of the OpenAI API. * 🚨 Remove ConversationalPipeline and Conversation object by Rocketknight1 in 31165 Remove an accidental duplicate softmax application in FLAVA's attention Removes duplicate softmax application in FLAVA attention. Likely to have a small change on the outputs but flagging with 🚨 as it will change a bit. * 🚨 FLAVA: Remove double softmax by amyeroberts in 31322 Idefics2's `ignore_index` attribute of the loss is updated to `-100` * 🚨 [Idefics2] Update ignore index by NielsRogge in 30898 out_indices from `timm` being updated Recent updates to timm changed the type of the attribute `model.feature_info.out_indices`. Previously, `out_indices` would reflect the input type of `out_indices` on the `create_model` call i.e. either `tuple` or `list`. Now, this value is always a tuple. As list are more useful and consistent for us -- we cannot save tuples in configs, they must be converted to lists first -- we instead choose to cast `out_indices` to always be a list. This has the possibility of being a slight breaking change if users are creating models and relying on `out_indices` on being a tuple. As this property only happens when a new model is created, and not if it's saved and reloaded (because of the config), then I think this has a low chance of having much of an impact. * 🚨 out_indices always a list by amyeroberts in 30941 datasets referenced in the quantization config get updated to remove references to datasets with restrictive licenses. * 🚨 Remove dataset with restrictive license by echarlaix in 31452 Bugfixes and improvements * Add fixed resize and pad strategy for object detection by qubvel in 30742 * Enable dynamic resolution input for Swin Transformer and variants by the-neural-networker in 30656 * Add TokenClassification for Mistral, Mixtral and Qwen2 by josephenguehard in 29878 * FIX / Quantization: Fix Dockerfile build by younesbelkada in 30890 * Add support for torch.compile dynamic shapes by warner-benjamin in 30560 * LLaVa-Next: Update docs with batched inference by zucchini-nlp in 30857 * DeformableDETR two stage support bfloat16 by DonggeunYu in 30907 * add return_token_timestamps to WhisperProcessor by kamilakesbi in 30812 * Fix num_hidden_layers in initialization of new model in Mamba by SrGonao in 30403 * separate kwargs in processor (similar to 30193) by Eric2i in 30905 * fix for custom pipeline configuration by not-lain in 29004 * Add AutoFeatureExtractor support to Wav2Vec2ProcessorWithLM by ylacombe in 28706 * Fix a shape annotation and typos in `mamba` slow forward by vasqu in 30691 * `tokenizer_class = "AutoTokenizer"` Llava Family by ArthurZucker in 30912 * Introduce configured_state arg for accelerator_config by muellerzr in 29781 * Add torch.compile for Mistral by zhenglongjiepheonix in 30642 * [docs] Spanish translation of model_memory_anatomy.md by aaronjimv in 30885 * FIX / TST: Fix expected results on Mistral slow test (A10) by younesbelkada in 30909 * PaliGemma - fix processor with no input text by hiyouga in 30916 * CI: AMD MI300 tests fix by mht-sharma in 30797 * Enforce saving at end of training if saving option chosen by muellerzr in 30160 * fix: center_crop occasionally outputs off-by-one dimension matrix by mattlbeck in 30934 * [Benchmark] Reuse `optimum-benchmark` by ydshieh in 30615 * TST / Workflows: Get slack notifications for docker image build by younesbelkada in 30891 * Fix swin embeddings interpolation by amyeroberts in 30936 * Fix inhomogeneous shape error in example by Zantares in 30434 * update ruff version by ArthurZucker in 30932 * Update build ci image [push-ci-image] by ArthurZucker in 30933) * Update video-llava docs by zucchini-nlp in 30935 * Fix low cpu mem usage tests by SunMarc in 30808 * [doc] Add references to the fine-tuning blog and distil-whisper to Whisper. by Vaibhavs10 in 30938 * Avoid extra chunk in speech recognition by jonatanklosko in 29539 * [whisper] only trigger forced ids warning once by sanchit-gandhi in 30966 * Paligemma - fix slow tests, add bf16 and f16 slow tests by molbap in 30851 * Finally fix the missing new model failure CI report by ydshieh in 30968 * legacy to init the slow tokenizer when converting from slow was wrong by ArthurZucker in 30972 * Generation: get special tokens from model config by zucchini-nlp in 30899 * [Whisper] Strip prompt before finding common subsequence by sanchit-gandhi in 27836 * Fix link in Pipeline documentation by junhl in 30948 * [Mistral and friends] Update MLP by NielsRogge in 31057 * Paligemma causal attention mask by molbap in 30967 * Update object detection with latest resize and pad strategies by qubvel in 30955 * Using assistant in AutomaticSpeechRecognitionPipeline with different encoder size by kamilakesbi in 30637 * Push ci image by ArthurZucker in 30982 * test_custom_4d_attention_mask skip with sliding window attn by poedator in 30833 * Finish adding support for torch.compile dynamic shapes by warner-benjamin in 30919 * FIX / Docs: Minor changes in quantization docs by younesbelkada in 30985 * Fix accelerate failing tests by SunMarc in 30836 * [tests] add `torch.use_deterministic_algorithms` for XPU by faaany in 30774 * Add a check that warmup_setps is either 0 or >= 1 by ymoslem in 30764 * Update 4 `MptIntegrationTests` expected outputs by ydshieh in 30989 * [Port] TensorFlow implementation of Mistral by ariG23498 in 29708 * Remove deprecated properties in tokenization_nllb.py and tokenization_nllb_fast.py by ymoslem in 29834 * Bugfix: WandbCallback uploads initial model checkpoint by mgerstgrasser in 30897 * add prefix space ignored in llama 29625 by itazap in 30964 * Fix training speed regression introduced by "optimize VRAM for calculating pos_bias in LayoutLM v2, v3 by kkoehncke in 26139)" * Do not trigger autoconversion if local_files_only by Wauplin in 31004 * pin `uv==0.1.45` by ydshieh in 31006 * Perceiver interpolate position embedding by g1y5x3 in 30979 * [tests] make `test_model_parallelism` device-agnostic by faaany in 30844 * FIX / TST: Fix expected results on Mistral AWQ test by SunMarc in 30971 * allow multi-gpu by ydshieh in 31011 * Fix resume_download future warning by Wauplin in 31007 * Quantization / TST: Fix remaining quantization tests by younesbelkada in 31000 * save the list of new model failures by ydshieh in 31013 * added interpolation for vitmae model in pytorch as well as tf. by bhuvanmdev in 30732 * Add split special tokens by itazap in 30772 * Paligemma- fix devices and dtype assignments by molbap in 31008 * Redirect transformers_agents doc to agents by aymeric-roucher in 31054 * unpin uv by ydshieh in 31055 * Follow up: Fix link in dbrx.md by eitanturok in 30514 * Update feature request label in template by amyeroberts in 30940 * Fix quanto tests by SunMarc in 31062 * Fix pad_to_max_length Whisper by ylacombe in 30787 * skip `test_model_parallelism` for 2 model test classes by ydshieh in 31067 * use `main` by ydshieh in 31065 * Remove `ninja` from docker image build by ydshieh in 31080 * fix "piano" typo by clinty in 31027 * Update quicktour.md to fix broken link to Glossary by apalkk in 31072 * Remove redundant backend checks in training_args.py by kevint324 in 30999 * fix from_pretrained in offline mode when model is preloaded in cache by oOraph in 31010 * Remove float64 cast for OwlVit and OwlV2 to support MPS device by qubvel in 31071 * Fix OWLv2 post_process_object_detection for multiple images by qubvel in 31082 * Fix typo in trainer.py by taslimisina in 31048 * [SuperPoint, PaliGemma] Update docs by NielsRogge in 31025 * Fix failing tokenizer tests by LysandreJik in 31083 * Watermark: fix tests by zucchini-nlp in 30961 * Docs / PEFT: Add PEFT API documentation by younesbelkada in 31078 * Render chat template tojson filter as unicode by CISC in 31041 * FIX: Add `accelerate` as a hard requirement by younesbelkada in 31090 * FIX / OPT: Fix OPT multi-GPU training for `OPTForQuestionAnswering` by younesbelkada in 31092 * skip `test_multi_gpu_data_parallel_forward` for `vit` and `deit` by ydshieh in 31086 * Fix PretrainedConfig docstring with deprecated resume_download by albertvillanova in 31014 * Fix DeepSpeed compatibility with weight_norm by jonnyli1125 in 30881) * TST: Fix instruct-blip tests by younesbelkada in 31088 * Docs / Quantization: Redirect deleted page by younesbelkada in 31063 * Deprecate low use models by amyeroberts in 30781 * Quantized KV cache: update quanto by zucchini-nlp in 31052 * FEAT: Add mistral v3 conversion script by younesbelkada in 30981 * Use `HF_HUB_OFFLINE` + fix has_file in offline mode by Wauplin in 31016 * Improve `transformers-cli env` reporting by statelesshz in 31003 * Fix env.py in cases where torch is not present by Rocketknight1 in 31113 * Fix faulty rstrip in module loading by Rocketknight1 in 31108 * Rm maintainer + migrate by muellerzr in 31089 * Fix nightly circleci by ydshieh in 31114 * FIX / Docs: Fix GPTQ expected number of bits by younesbelkada in 31111 * Add VLM generation default contributor by gante in 31115 * Add on_optimizer_step to callback options by dhruvbpai in 31095 * Cleanup docker build by ydshieh in 31119 * FIX / Quantization: Add extra validation for bnb config by younesbelkada in 31135 * fix get_scheduler when name is warmup_stable_decay by zspo in 31128 * Docs / Quantization: Replace all occurences of `load_in_8bit` with bnb config by younesbelkada in 31136 * Workflow: Remove `IS_GITHUB_CI` by younesbelkada in 31147 * helper by ArthurZucker in 31152 * pytest -rsfE by ydshieh in 31140 * Fix quantized cache output by SunMarc in 31143 * Update sam.md by asifajrof in 31130 * Quantization: Enhance bnb error message by younesbelkada in 31160 * [trainer] add sanity evaluation option by SunMarc in 31146 * Add streaming, various fixes by aymeric-roucher in 30838 * Added description of quantization_config by vamsivallepu in 31133 * Fix typo: use_safetenstors to use_safetensors by CharlesCNorton in 31184 * Remove copied froms for deprecated models by amyeroberts in 31153 * Token healing by ahmed-moubtahij in 30081 * [`GemmaModel`] fix small typo by ArthurZucker in 31202 * Fix Cannot convert [array()] to EagerTensor of dtype int64 by pavi-ninjaac in 31109 * Ignore non-causal mask in more cases with SDPA by fxmarty in 30138 * SlidingWindowCache: reduce differences to other Cache classes by gante in 30970 * Fix `test_compile_static_cache` by ydshieh in 30991 * fix the get_size_with_aspect_ratio in max_size situation by SangbumChoi in 30902 * Fix typo in utils by Bojun-Feng in 31169 * Rename sanity_evaluation to eval_on_start by Qubitium in 31192 * Wrong translation FR : Contents = Contenu by jadechoghari in 31186 * Cohere: Fix copied from by younesbelkada in 31213 * Set greater_is_better to False if metric_for_best_model ends with "loss" by miivanov90 in 31142 * Fix GPU OOM for `mistral.py::Mask4DTestHard` by ydshieh in 31212 * [docs] Spanish translation of tokenizer_summary.md by aaronjimv in 31154 * Pass device in Logits Processor's init by zucchini-nlp in 29804 * Fix sentence fragment within test comments by DomHudson in 31218 * fix(PatchTST): Wrong dropout used for PretainHead by maxstrobel in 31117 * Video-LLaVa: handle any number of frames by zucchini-nlp in 31221 * Add dynamic resolution input/interpolate position embedding to deit by p-kris10 in 31131 * fix bf16 issue in text classification pipeline by chujiezheng in 30996 * Fix pipeline tests - torch imports by amyeroberts in 31227 * Add new line switch before logging ***** Running {description} ***** by jacklanda in 31225 * add no split modules for xlmrobertaxl by ManuelFay in 31223 * Fix `MistralIntegrationTest` by ydshieh in 31231 * Blip: Deprecate `BlipModel` by younesbelkada in 31235 * Move out common backbone config param validation by amyeroberts in 31144 * Upload (daily) CI results to Hub by ydshieh in 31168 * Specify dtype=torch.bool to avoid xla error by ysulsky in 31191 * Fixing `name 'torch' is not defined` in `bitsandbytes` integration by jamesbraza in 31243 * Benchmark GitHub Actions workflow by ydshieh in 31163 * Early labels validation by amyeroberts in 31240 * doc: add info about wav2vec2 bert in older wav2vec2 models. by Vaibhavs10 in 31120 * enable deterministic mode for npu by statelesshz in 31253 * Add missing Flaubert tokenizer tests by bastrob in 30492 * Fix circular reference issue in CLIPTokenizerFast by dhaivat1729 in 31075 * Add condition to `benchmark` job in `push-important-models.yml` by ydshieh in 31259 * Skip failing JetMOE generation tests by amyeroberts in 31266 * no need for explicit EXTRA_TOKENS in processing_paligemma.py by grahamannett in 31022 * [`SwitchTransformer`] Significant performance improvement on MoE blocks by ranggihwang in 31173 * fix loading special_tokens_map_file by ZhiyuanChen in 31012 * Make mamba use cache by zucchini-nlp in 31116 * Generation: fix handling of special tokens by zucchini-nlp in 31254 * Switch from `cached_download` to `hf_hub_download` in remaining occurrences by Wauplin in 31284 * fix: `str` should be used not `int` when setting env variables by statelesshz in 31272 * Fix _save_tpu: use _maybe_convert_to_cpu instead of to cpu. by baoleai in 31264 * fix accelerate tests for roberta xl by SunMarc in 31288 * Enable dynamic resolution input for Beit by OmarManzoor in 31053 * Mark MobileNetV1ModelTest::test_batching_equivalence as flaky by amyeroberts in 31258 * Pipeline VQA: Add support for list of images and questions as pipeline input by BlacCod in 31217 * Fix SwinLayer / DonutSwinLayer / ClapAudioLayer attention mask device by gorodnitskiy in 31295 * Update text-to-speech.md by jaguaryang in 31269 * Fixed Wav2Vec2ProcessorWithLM decoding error by karicotiza in 31188 * Fix jetmoe model by Cyrilvallez in 31279 * Extend save_pretrained to offloaded models by blbadger in 27412 * Implement JSON dump conversion for torch_dtype in TrainingArguments by junrae6454 in 31224 * interpolation added for TVP. by bhuvanmdev in 30863 * Rename test_model_common_attributes -> test_model_get_set_embeddings by amyeroberts in 31321 * Use unused prepare_img() function in dinov2 conversion script by IbrahimAmin1 in 31335 * docs: fix style by imba-tjd in 31340 * Fix paligemma inverted mask by molbap in 31207 * docs/zh: fix style by imba-tjd in 31334 * Decorators for deprecation and named arguments validation by qubvel in 30799 * Improve error msg when using bitsandbytes by SunMarc in 31350 * Fix Cohere CI by ydshieh in 31263 * Fix gradio tool demos by aymeric-roucher in 31230 * Fast image processor by amyeroberts in 28847 * Add french translation of AutoBackbone by jadechoghari in 31300 * Add support to declare imports for code agent by JasonZhu1313 in 31355 * Fix idefics cache by zucchini-nlp in 31377 * [Bug Fix] Renamed loss to losses to suppress UnboundLocalError by her0e1c1 in 31365 * docs: fix broken link by imba-tjd in 31370 * backbone_utils - fix relative import by amyeroberts in 31382 * README underline between badges fix by novialriptide in 31376 * Update comment in modeling_utils.py by inf3rnus in 31299 * Use huggingface_hub helper function to split state dict by SunMarc in 31091 * Change JSON serialization to custom json.dumps by junrae6454 in 31100 * feat(ci): add trufflehog secrets detection by McPatate in 31344 * [QoL fix] [Image processing] Add warning on assumption of channel dim and avoid infering when inputs are PIL.Image by aliencaocao in 31364 * Make chat templates part of ProcessorMixin by Rocketknight1 in 30744 * add initial design for uniform processors + align model by molbap in 31197 * Add missing French translation of tutoriel_pipeline.md by jadechoghari in 31396 * Temporarily pin datasets upper version to fix CI by albertvillanova in 31407 * Support Clip QKV for MPT by akakakakakaa in 31307 * Pin datasets<2.20.0 for examples by amyeroberts in 31417 * Fix MusicGen SDPA by ylacombe in 31208 * Set seed for M4T retain grad test by ylacombe in 31419 * Fix SpeechT5 `decoder_attention_mask` shape by ylacombe in 28071 * Change potential `inputs_embeds` padding `logger.warning` to `logger.warning_once` by naimenz in 31411 * Remove duplicate image processor in auto map by amyeroberts in 31383 * Install the tensorflow example requirements in docker by amyeroberts in 31428 * Remove empty create_and_test_config_common_properties tests by amyeroberts in 31359 * xpu: support xpu backend from stock pytorch (>=2.4) by dvrogozh in 31238 * Musicgen special tokens in tensors by zucchini-nlp in 31420 * Fix Bark logits processors device misplacement by ylacombe in 31416 * Rename misnamed image processor test files by amyeroberts in 31430 * Generate: fix `tokenizer` being popped twice by gante in 31427 * [tests] make `TestDeepSpeedModelZoo` device-agnostic by faaany in 31402 * Support multiple validation datasets when `dataloader_persistent_workers=True` by bastienlc in 30627 * Pass datasets trust_remote_code by albertvillanova in 31406 * simple fix by tokenizer-decode in 31456 * Fix typing errors in `Qwen2ForTokenClassification` by kevinhu in 31440 * Agents: Improve python interpreter by aymeric-roucher in 31409 * Donut: fix `generate` call from local path by gante in 31470 * Make "tool_use" the default chat template key when tools are passed by Rocketknight1 in 31429 * Fix single letter stop strings by Rocketknight1 in 31448 * Update chat template docs and bump Jinja version by Rocketknight1 in 31455 * Improve `PreTrainedTokenizerFast` loading time when there are many added tokens by ydshieh in 31404 * Fix documentation typos by qgallouedec in 31476 * Give more useful `metric_for_best_model` errors by tomaarsen in 31450 * Update perf_train_gpu_many.md by remyleone in 31451 * [`GPT2`] Add SDPA support by vasqu in 31172 * Fix autocast incompatibility in RecurrentGemma by xplip in 30832 * Use self.config_tester.run_common_tests() by amyeroberts in 31431 * [tests] rename `test_config_object` to `test_ds_config_object` by faaany in 31403 * Docs / AQLM: Clarify `torch.compile` support for AQLM by younesbelkada in 31473 * Mamba: add generative tests by gante in 31478 * Update object_detection.md by jajupmochi in 31488 * Add docs on zeroshot image classification prompt templates by aliencaocao in 31343 * auto-detect device when no device is passed to pipeline by faaany in 31398 * Fix typo: pas_token_id by ftnext in 30894 * Fix `wandb` integration with `SetFit` model by timothepearce in 30021 * Consider inheritance in type checking for tensors by daemyung in 31378 * Add valid columns check in _remove_unused_columns method by arthasking123 in 31466 * Fix a teeny-tiny typo in `tokenization_utils_base.py`'s docstring by sadra-barikbin in 31510 * Fix mismatched ` in doc & other common typos by jhwei in 31516 * RWKV: enable generation tests by gante in 31490 * unskip 2 tests in cohere by ydshieh in 31517 * Revive Nightly/Past CI by ydshieh in 31159 * Deprecate legacy cache + use cache position by zucchini-nlp in 31491 * SPLIT PR: add user defined symbols and control symbols by itazap in 31305 * Removed torch.cuda.empty_cache from train loop. by FoamoftheSea in 31530 * Update mask_generation.md by nicholicaron in 31543 * Correct is_flaky test decoration by qubvel in 31480 * Add implementation of `spectrogram_batch` by ravenouse in 27159 * chore: fix typos by xiaoxianBoy in 31559 * Update git templates by ArthurZucker in 31539 * Fix the error caused by incorrect use of logger in pipeline by lanyun1103 in 31565 * Fix bug about add_special_tokens and so on by hiroshi-matsuda-rit in 31496 * Add Jinja as a requirement with the right version cutoff by Rocketknight1 in 31536 * Fix doc typo in `TrainingArguments` by qgallouedec in 31503 * Fix is_torch_xpu_available for torch < 2.3 by amyeroberts in 31573 * Added version constraint on numpy for version <2.0 by Resteklicken in 31569 * Siglip: add `_no_split_module` by zucchini-nlp in 31566 * fix output data type of image classification by jiqing-feng in 31444 * add preprocessing_num_workers to run_classification.py by jiahuanluo in 31586 * Improve error message for mismatched copies in code blocks by molbap in 31535 * Add ViTImageProcessorFast to tests by amyeroberts in 31424 * docs: move translations to `i18n` by SauravMaheshkar in 31584 * Removed unnecessary `self.projection` call in `VivitTubeletEmbeddings` by v-iashin in 31632 * [`GPT-NeoX`] Add SDPA support by vasqu in 31031 * Update RT-DETR code snippet by qubvel in 31631 * Llama et al. / FSDP : Fix breaking change in 4.40 for FSDP by younesbelkada in 31161 * Fix RT-DETR inference with float16 and bfloat16 by qubvel in 31639 * Fix paligemma detection inference by molbap in 31587 * Generate: fix assisted generation with `past_key_values` passed as kwargs by gante in 31644 * Fix dtype casting in swinv2 and swinv2sr to allow non-FP32 inference by aliencaocao in 31589 * Skip tests properly by amyeroberts in 31308 * Generation: past kv can be None by zucchini-nlp in 31051 * Fix ONNX exports for Optimum compatible models by merveenoyan in 31311 Significant community contributions The following contributors have made significant changes to the library over the last release: * josephenguehard * Add TokenClassification for Mistral, Mixtral and Qwen2 (29878) * vasqu * Fix a shape annotation and typos in `mamba` slow forward (30691) * [`GPT2`] Add SDPA support (31172) * [`GPT-NeoX`] Add SDPA support (31031) * ariG23498 * [Port] TensorFlow implementation of Mistral (29708) * bhuvanmdev * added interpolation for vitmae model in pytorch as well as tf. (30732) * interpolation added for TVP. (30863) * SangbumChoi * fix the get_size_with_aspect_ratio in max_size situation (30902) * New model support RTDETR (29077) * Cyrilvallez * Reduce by 2 the memory requirement in `generate()` 🔥🔥🔥 (30536) * Fix jetmoe model (31279) * ravenouse * Add implementation of `spectrogram_batch` (27159) ``` ### 4.41.2 ``` Mostly fixing some stuff related to `trust_remote_code=True` and `from_pretrained` The `local_file_only` was having a hard time when a `.safetensors` file did not exist. This is not expected and instead of trying to convert, we should just fallback to loading the `.bin` files. * Do not trigger autoconversion if local_files_only 31004 from Wauplin fixes this! * Paligemma: Fix devices and dtype assignments (31008) by molbap * Redirect transformers_agents doc to agents (31054) aymeric-roucher * Fix from_pretrained in offline mode when model is preloaded in cache (31010) by oOraph * Fix faulty rstrip in module loading (31108) Rocketknight1 ``` ### 4.41.1 ``` Fix PaliGemma finetuning: The causal mask and label creation was causing label leaks when training. Kudos to probicheaux for finding and reporting! - https://github.com/huggingface/transformers/commit/a755745546779ae5c42510bc02a859bdac82b3b7 : PaliGemma - fix processor with no input text (https://github.com/huggingface/transformers/pull/30916) hiyouga - https://github.com/huggingface/transformers/commit/a25f7d3c12975fe21eab437dda7363e9024de7c0 : Paligemma causal attention mask (https://github.com/huggingface/transformers/pull/30967) molbap and probicheaux Other fixes: - https://github.com/huggingface/transformers/commit/bb48e921868ac750417956de941606f7e2fa02ca: tokenizer_class = "AutoTokenizer" Llava Family (https://github.com/huggingface/transformers/pull/30912) - https://github.com/huggingface/transformers/commit/1d568dfab262f76079eb4f3d05b606d51a0c9e4b : legacy to init the slow tokenizer when converting from slow was wrong (https://github.com/huggingface/transformers/pull/30972) - https://github.com/huggingface/transformers/commit/b1065aa08ac0da11fcb9e3827cd7eafabe4beebd : Generation: get special tokens from model config (https://github.com/huggingface/transformers/pull/30899) zucchini-nlp Reverted https://github.com/huggingface/transformers/commit/4ab7a28216211571fdddba414d4edd8426ab6489 ``` ### 4.41.0 ``` New models Phi3 The Phi-3 model was proposed in [Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone](https://arxiv.org/abs/2404.14219) by Microsoft. TLDR; Phi-3 introduces new ROPE scaling methods, which seems to scale fairly well! A 3b and a Phi-3-mini is available in two context-length variants—4K and 128K tokens. It is the first model in its class to support a context window of up to 128K tokens, with little impact on quality. <img width="1599" alt="image" src="https://github.com/huggingface/transformers/assets/48595927/0f37c6b0-b118-453c-ac64-6e45aa291d0a"> * Phi-3 by gugarosa in https://github.com/huggingface/transformers/pull/30423 JetMoE JetMoe-8B is an 8B Mixture-of-Experts (MoE) language model developed by [Yikang Shen](https://scholar.google.com.hk/citations?user=qff5rRYAAAAJ) and [MyShell](https://myshell.ai/). JetMoe project aims to provide a LLaMA2-level performance and efficient language model with a limited budget. To achieve this goal, JetMoe uses a sparsely activated architecture inspired by the [ModuleFormer](https://arxiv.org/abs/2306.04640). Each JetMoe block consists of two MoE layers: Mixture of Attention Heads and Mixture of MLP Experts. Given the input tokens, it activates a subset of its experts to process them. This sparse activation schema enables JetMoe to achieve much better training throughput than similar size dense models. The training throughput of JetMoe-8B is around 100B tokens per day on a cluster of 96 H100 GPUs with a straightforward 3-way pipeline parallelism strategy. <img width="1559" alt="image" src="https://github.com/huggingface/transformers/assets/48595927/cc83ce99-7a61-4d04-a234-3f68e6c0fafd"> * Add JetMoE model by yikangshen in https://github.com/huggingface/transformers/pull/30005 PaliGemma PaliGemma is a lightweight open vision-language model (VLM) inspired by [PaLI-3](https://arxiv.org/abs/2310.09199), and based on open components like the [SigLIP vision model](https://arxiv.org/abs/2303.15343) and the [Gemma language model](https://arxiv.org/abs/2403.08295). PaliGemma takes both images and text as inputs and can answer questions about images with detail and context, meaning that PaliGemma can perform deeper analysis of images and provide useful insights, such as captioning for images and short videos, object detection, and reading text embedded within images. More than 120 checkpoints are released see the collection [here](https://huggingface.co/collections/google/paligemma-release-6643a9ffbf57de2ae0448dda) ! <img width="1064" alt="image" src="https://github.com/huggingface/transformers/assets/48595927/23584b9a-6c36-46f5-8700-32f402c0f674"> * Add PaliGemma by molbap in https://github.com/huggingface/transformers/pull/30814 VideoLlava Video-LLaVA exhibits remarkable interactive capabilities between images and videos, despite the absence of image-video pairs in the dataset. 💡 Simple baseline, learning united visual representation by alignment before projection With the binding of unified visual representations to the language feature space, we enable an LLM to perform visual reasoning capabilities on both images and videos simultaneously. 🔥 High performance, complementary learning with video and image Extensive experiments demonstrate the complementarity of modalities, showcasing significant superiority when compared to models specifically designed for either images or videos. <img width="532" alt="image" src="https://cdn-uploads.huggingface.co/production/uploads/62441d1d9fdefb55a0b7d12c/cLniWc__KECBBesliHKhd.png"> * Add Video Llava by zucchini-nlp in https://github.com/huggingface/transformers/pull/29733 Falcon 2 and FalconVLM: <img width="1024" alt="image" src="https://falconllm.tii.ae/assets/images/table-1___.jpeg"> Two new models from TII-UAE! They published a [blog-post](https://falconllm.tii.ae/falcon-2.html) with more details! Falcon2 introduces parallel mlp, and falcon VLM uses the `Llava` framework * Support for Falcon2-11B by Nilabhra in https://github.com/huggingface/transformers/pull/30771 * Support arbitrary processor by ArthurZucker in https://github.com/huggingface/transformers/pull/30875 GGUF `from_pretrained` support <img width="1064" alt="image" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/gguf-spec.png"> You can now load most of the GGUF quants directly with transformers' `from_pretrained` to convert it to a classic pytorch model. The API is simple: python from transformers import AutoTokenizer, AutoModelForCausalLM model_id = "TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF" filename = "tinyllama-1.1b-chat-v1.0.Q6_K.gguf" tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename) model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=filename) We plan more closer integrations with llama.cpp / GGML ecosystem in the future, see: https://github.com/huggingface/transformers/issues/27712 for more details * Loading GGUF files support by LysandreJik in https://github.com/huggingface/transformers/pull/30391 ``` ### 4.40.2 ``` Fix torch fx for LLama model - Fix for Neuron (30259) - Fix copies for DBRX - neuron fix (30610) Thanks michaelbenayoun ! ``` ### 4.40.1 ``` Kudos to pcuenca for the prompt fix in: - Make EosTokenCriteria compatible with mps 30376 To support `EosTokenCriteria` on MPS while `pytorch` adds this functionality. ``` ### 4.40.0 ``` New model additions Llama 3 Llama 3 is supported in this release through the Llama 2 architecture and some fixes in the `tokenizers` library. Idefics2 <img src="https://huggingface.co/HuggingFaceM4/idefics-80b/resolve/main/assets/IDEFICS.png" alt="drawing" width="300"/> The Idefics2 model was created by the Hugging Face M4 team and authored by Léo Tronchon, Hugo Laurencon, Victor Sanh. The accompanying blog post can be found here. Idefics2 is an open multimodal model that accepts arbitrary sequences of image and text inputs and produces text outputs. The model can answer questions about images, describe visual content, create stories grounded on multiple images, or simply behave as a pure language model without visual inputs. It improves upon IDEFICS-1, notably on document understanding, OCR, or visual reasoning. Idefics2 is lightweight (8 billion parameters) and treats images in their native aspect ratio and resolution, which allows for varying inference efficiency. * Add Idefics2 by amyeroberts in 30253 Recurrent Gemma <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/recurrent-gemma.png" alt="drawing" width="600"/> <small> Recurrent Gemma architecture. Taken from the <a href="https://arxiv.org/pdf/2402.19427.pdf">original paper.</a> </small> The Recurrent Gemma model was proposed in RecurrentGemma: Moving Past Transformers for Efficient Open Language Models by the Griffin, RLHF and Gemma Teams of Google. The abstract from the paper is the following: We introduce RecurrentGemma, an open language model which uses Google’s novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide a pre-trained model with 2B non-embedding parameters, and an instruction tuned variant. Both models achieve comparable performance to Gemma-2B despite being trained on fewer tokens. * Add recurrent gemma by ArthurZucker in 30143 Jamba Jamba is a pretrained, mixture-of-experts (MoE) generative text model, with 12B active parameters and an overall of 52B parameters across all experts. It supports a 256K context length, and can fit up to 140K tokens on a single 80GB GPU. As depicted in the diagram below, Jamba’s architecture features a blocks-and-layers approach that allows Jamba to successfully integrate Transformer and Mamba architectures altogether. Each Jamba block contains either an attention or a Mamba layer, followed by a multi-layer perceptron (MLP), producing an overall ratio of one Transformer layer out of every eight total layers. ![image](https://github.com/huggingface/transformers/assets/48595927/d78bb917-7a8a-4959-8206-e493c6c75f3d) Jamba introduces the first `HybridCache` object that allows it to natively support assisted generation, contrastive search, speculative decoding, beam search and all of the awesome features from the `generate` API! * Add jamba by tomeras91 in 29943 DBRX DBRX is a [transformer-based](https://www.isattentionallyouneed.com/) decoder-only large language model (LLM) that was trained using next-token prediction. It uses a *fine-grained* mixture-of-experts (MoE) architecture with 132B total parameters of which 36B parameters are active on any input. It was pre-trained on 12T tokens of text and code data. Compared to other open MoE models like Mixtral-8x7B and Grok-1, DBRX is fine-grained, meaning it uses a larger number of smaller experts. DBRX has 16 experts and chooses 4, while Mixtral-8x7B and Grok-1 have 8 experts and choose 2. This provides 65x more possible combinations of experts and the authors found that this improves model quality. DBRX uses rotary position encodings (RoPE), gated linear units (GLU), and grouped query attention (GQA). * Add DBRX Model by abhi-mosaic in 29921 OLMo The OLMo model was proposed in OLMo: Accelerating the Science of Language Models by Dirk Groeneveld, Iz Beltagy, Pete Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Harsh Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson, Russell Authur, Khyathi Raghavi Chandu, Arman Cohan, Jennifer Dumas, Yanai Elazar, Yuling Gu, Jack Hessel, Tushar Khot, William Merrill, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Valentina Pyatkin, Abhilasha Ravichander, Dustin Schwenk, Saurabh Shah, Will Smith, Emma Strubell, Nishant Subramani, Mitchell Wortsman, Pradeep Dasigi, Nathan Lambert, Kyle Richardson, Luke Zettlemoyer, Jesse Dodge, Kyle Lo, Luca Soldaini, Noah A. Smith, Hannaneh Hajishirzi. OLMo is a series of Open Language Models designed to enable the science of language models. The OLMo models are trained on the Dolma dataset. We release all code, checkpoints, logs (coming soon), and details involved in training these models. * Add OLMo model family by 2015aroras in 29890 Qwen2MoE Qwen2MoE is the new model series of large language models from the Qwen team. Previously, we released the Qwen series, including Qwen-72B, Qwen-1.8B, Qwen-VL, Qwen-Audio, etc. Model Details Qwen2MoE is a language model series including decoder language models of different model sizes. For each size, we release the base language model and the aligned chat model. Qwen2MoE has the following architectural choices: Qwen2MoE is based on the Transformer architecture with SwiGLU activation, attention QKV bias, group query attention, mixture of sliding window attention and full attention, etc. Additionally, we have an improved tokenizer adaptive to multiple natural languages and codes. Qwen2MoE employs Mixture of Experts (MoE) architecture, where the models are upcycled from dense language models. For instance, Qwen1.5-MoE-A2.7B is upcycled from Qwen-1.8B. It has 14.3B parameters in total and 2.7B activated parameters during runtime, while it achieves comparable performance with Qwen1.5-7B, with only 25% of the training resources. * Add Qwen2MoE by bozheng-hit in 29377 Grounding Dino <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/grouding_dino_architecture.png" alt="drawing" width="600"/> <small> Taken from the <a href="https://arxiv.org/pdf/2303.05499.pdf">original paper.</a> </small> The Grounding DINO model was proposed in Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection by Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang. Grounding DINO extends a closed-set object detection model with a text encoder, enabling open-set object detection. The model achieves remarkable results, such as 52.5 AP on COCO zero-shot. * Adding grounding dino by EduardoPach in 26087 Static pretrained maps Static pretrained maps have been removed from the library's internals and are currently deprecated. These used to reflect all the available checkpoints for a given architecture on the Hugging Face Hub, but their presence does not make sense in light of the huge growth of checkpoint shared by the community. With the objective of lowering the bar of model contributions and reviewing, we first start by removing legacy objects such as this one which do not serve a purpose. * Remove static pretrained maps from the library's internals by LysandreJik in 29112 Notable improvements Processors improvements Processors are ungoing changes in order to uniformize them and make them clearer to use. * Separate out kwargs in processor by amyeroberts in 30193 * [Processor classes] Update docs by NielsRogge in 29698 SDPA * re-introduced the fast path for sdpa by fxmarty in 30070 Push to Hub for pipelines Pipelines can now be pushed to Hub using a convenient `push_to_hub` method. * add `push_to_hub` to pipeline by not-lain in 29172 Flash Attention 2 for more models (M2M100, NLLB, GPT2, MusicGen) ! Thanks to the community contribution, Flash Attention 2 has been integrated for more architectures * Adding Flash Attention 2 Support for GPT2 by EduardoPach in 29226 * Add Flash Attention 2 support to Musicgen and Musicgen Melody by ylacombe in 29939 * Add Flash Attention 2 to M2M100 model by visheratin in 30256 Improvements and bugfixes * [docs] Remove redundant `-` and `the` from custom_tools.md by windsonsea in 29767 * Fixed typo in quantization_config.py by kurokiasahi222 in 29766 * OWL-ViT box_predictor inefficiency issue by RVV-karma in 29712 * Allow `-OO` mode for `docstring_decorator` by matthid in 29689 * fix issue with logit processor during beam search in Flax by giganttheo in 29636 * Fix docker image build for `Latest PyTorch + TensorFlow [dev]` by ydshieh in 29764 * [`LlavaNext`] Fix llava next unsafe imports by ArthurZucker in 29773 * Cast bfloat16 to float32 for Numpy conversions by Rocketknight1 in 29755 * Silence deprecations and use the DataLoaderConfig by muellerzr in 29779 * Add deterministic config to `set_seed` by muellerzr in 29778 * Add support for `torch_dtype` in the run_mlm example by jla524 in 29776 * Generate: remove legacy generation mixin imports by gante in 29782 * Llama: always convert the causal mask in the SDPA code path by gante in 29663 * Prepend `bos token` to Blip generations by zucchini-nlp in 29642 * Change in-place operations to out-of-place in LogitsProcessors by zucchini-nlp in 29680 * [`quality`] update quality check to make sure we check imports 😈 by ArthurZucker in 29771 * Fix type hint for train_dataset param of Trainer.__init__() to allow IterableDataset. Issue 29678 by stevemadere in 29738 * Enable AMD docker build CI by IlyasMoutawwakil in 29803 * Correct llava mask & fix missing setter for `vocab_size` by fxmarty in 29389 * rm input dtype change in CPU by jiqing-feng in 28631 * Generate: remove unused attributes in `AssistedCandidateGenerator` by gante in 29787 * replaced concatenation to f-strings to improve readability and unify … by igeni in 29785 * [`cleanup`] vestiges of causal mask by ArthurZucker in 29806 * Complete security policy with mentions of remote code by LysandreJik in 29707 * [`SuperPoint`] Fix doc example by amyeroberts in 29816 * [DOCS] Fix typo for llava next docs by aliencaocao in 29829 * model_summary.md - Restore link to Harvard's Annotated Transformer. by gamepad-coder in 29702 * Fix the behavior of collecting 'num_input_tokens_seen' by YouliangHUANG in 29099 * Populate torch_dtype from model to pipeline by B-Step62 in 28940 * remove quotes in code example by johko in 29812 * Add warnings if training args differ from checkpoint trainer state by jonflynng in 29255 * Replace 'decord' with 'av' in VideoClassificationPipeline by Tyx-main in 29747 * Fix header in IFE task guide by merveenoyan in 29859 * [docs] Indent ordered list in add_new_model.md by windsonsea in 29796 * Allow `bos_token_id is None` during the generation with `inputs_embeds` by LZHgrla in 29772 * Add `cosine_with_min_lr` scheduler in Trainer by liuyanyi in 29341 * Disable AMD memory benchmarks by IlyasMoutawwakil in 29871 * Set custom_container in build docs workflows by Wauplin in 29855 * Support `num_attention_heads` != `num_key_value_heads` in Flax Llama Implementation by bminixhofer in 29557 * Mamba `slow_forward` gradient fix by vasqu in 29563 * Fix 29807, sinusoidal positional encodings overwritten by post_init() by hovnatan in 29813 * Reimplement "Automatic safetensors conversion when lacking these files" by LysandreJik in 29846 * fix fuyu device_map compatibility by SunMarc in 29880 * Move `eos_token_id` to stopping criteria by zucchini-nlp in 29459 * add Cambricon MLUs support by huismiling in 29627 * MixtralSparseMoeBlock: add gate jitter by lorenzoverardo in 29865 * Fix typo in T5Block error message by Mingosnake in 29881 * [`make fix-copies`] update and help by ArthurZucker in 29924 * [`GptNeox`] don't gather on pkv when using the trainer by ArthurZucker in 29892 * [`pipeline`]. Zero shot add doc warning by ArthurZucker in 29845 * [doc] fix some typos and add `xpu` to the testing documentation by faaany in 29894 * Tests: replace `torch.testing.assert_allclose` by `torch.testing.assert_close` by gante in 29915 * Add beam search visualizer to the doc by aymeric-roucher in 29876 * Safe import of LRScheduler by amyeroberts in 29919 * add functions to inspect model and optimizer status to trainer.py by CKeibel in 29838 * RoPE models: add numerical sanity-check test for RoPE scaling by gante in 29808 * [`Mamba`] from pretrained issue with `self.embeddings` by ArthurZucker in 29851 * [ `TokenizationLlama`] fix the way we convert tokens to strings to keep leading spaces 🚨 breaking fix by ArthurZucker in 29453 * Allow GradientAccumulationPlugin to be configured from AcceleratorConfig by fabianlim in 29589 * [`BC`] Fix BC for other libraries by ArthurZucker in 29934 * Fix doc issue 29758 in DebertaV2Config class by vinayakkgarg in 29842 * [`LlamaSlowConverter`] Slow to Fast better support by ArthurZucker in 29797 * Update installs in image classification doc by MariaHei in 29947 * [`StableLm`] Add QK normalization and Parallel Residual Support by jon-tow in 29745 * Mark `test_eager_matches_sdpa_generate` flaky for some models by ydshieh in 29479 * Super tiny fix 12 typos about "with with" by fzyzcjy in 29926 * Fix rope theta for OpenLlama by jla524 in 29893 * Add warning message for `run_qa.py` by jla524 in 29867 * fix: get mlflow version from mlflow-skinny by clumsy in 29918 * Reset alarm signal when the function is ended by coldnight in 29706 * Update model card and link of blog post. by bozheng-hit in 29928 * [`BC`] Fix BC for AWQ quant by TechxGenus in 29965 * Rework tests to compare trainer checkpoint args by muellerzr in 29883 * Fix FA2 tests by ylacombe in 29909 * Fix copies main ci by ArthurZucker in 29979 * [tests] fix the wrong output in `ImageToTextPipelineTests.test_conditional_generation_llava` by faaany in 29975 * Generate: move misplaced test by gante in 29902 * [docs] Big model loading by stevhliu in 29920 * [`generate`] fix breaking change for patch by ArthurZucker in 29976 * Fix 29807 sinusoidal positional encodings in Flaubert, Informer and XLM by hovnatan in 29904 * [bnb] Fix bug in `_replace_with_bnb_linear` by SunMarc in 29958 * Adding FlaxNoRepeatNGramLogitsProcessor by giganttheo in 29677 * [Docs] Make an ordered list prettier in add_tensorflow_model.md by windsonsea in 29949 * Fix `skip_special_tokens` for `Wav2Vec2CTCTokenizer._decode` by msublee in 29311 * Hard error when ignoring tensors. by Narsil in 27484) * Generate: fix logits processors doctests by gante in 29718 * Fix `remove_columns` in `text-classification` example by mariosasko in 29351 * Update `tests/utils/tiny_model_summary.json` by ydshieh in 29941 * Make EncodecModel.decode ONNX exportable by fxmarty in 29913 * Fix Swinv2ForImageClassification NaN output by miguelm-almeida in 29981 * Fix Qwen2Tokenizer by jklj077 in 29929 * Fix `kwargs` handling in `generate_with_fallback` by cifkao in 29225 * Fix probability computation in `WhisperNoSpeechDetection` when recomputing scores by cifkao in 29248 * Fix vipllava for generation by zucchini-nlp in 29874 * [docs] Fix audio file by stevhliu in 30006 * Superpoint imports fix by zucchini-nlp in 29898 * [`Main CIs`] Fix the red cis by ArthurZucker in 30022 * Make clearer about zero_init requirements by muellerzr in 29879 * Enable multi-device for efficientnet by jla524 in 29989 * Add a converter from mamba_ssm -> huggingface mamba by byi8220 in 29705 * [`ProcessingIdefics`] Attention mask bug with padding by byi8220 in 29449 * Add `whisper` to `IMPORTANT_MODELS` by ydshieh in 30046 * skip `test_encode_decode_fast_slow_all_tokens` for now by ydshieh in 30044 * if output is tuple like facebook/hf-seamless-m4t-medium, waveform is … by sywangyi in 29722 * Fix mixtral ONNX Exporter Issue. by AdamLouly in 29858 * [Trainer] Allow passing image processor by NielsRogge in 29896 * [bnb] Fix offload test by SunMarc in 30039 * Update quantizer_bnb_4bit.py: In the ValueError string there should be "....you need to set `llm_int8_enable_fp32_cpu_offload=True`...." instead of "`load_in_8bit_fp32_cpu_offload=True`". by miRx923 in 30013 * [test fetcher] Always include the directly related test files by ydshieh in 30050 * Fix `torch.fx` symbolic tracing for LLama by michaelbenayoun in 30047 * Refactor daily CI workflow by ydshieh in 30012 * Add docstrings and types for MambaCache by koayon in 30023 * Fix auto tests by ydshieh in 30067 * Fix whisper kwargs and generation config by zucchini-nlp in 30018 * doc: Correct spelling mistake by caiyili in 30107 * [Whisper] Computing features on GPU in batch mode for whisper feature extractor. by vaibhavagg303 in 29900 * Change log level to warning for num_train_epochs override by xu-song in 30014 * Make MLFlow version detection more robust and handles mlflow-skinny by helloworld1 in 29957 * updated examples/pytorch/language-modeling scripts and requirements.txt to require datasets>=2.14.0 by Patchwork53 in 30120 * [tests] add `require_bitsandbytes` marker by faaany in 30116 * fixing issue 30034 - adding data format for run_ner.py by JINO-ROHIT in 30088 * Patch fix - don't use safetensors for TF models by amyeroberts in 30118 * [29174] ImportError Fix: Trainer with PyTorch requires accelerate>=0.20.1 Fix by UtkarshaGupte in 29888 * Accept token in trainer.push_to_hub() by mapmeld in 30093 * fix learning rate display in trainer when using galore optimizer by vasqu in 30085 * Fix falcon with SDPA, alibi but no passed mask by fxmarty in 30123 * Trainer / Core : Do not change init signature order by younesbelkada in 30126 * Make vitdet jit trace complient by fxmarty in 30065 * Fix typo at ImportError by DrAnaximandre in 30090 * Adding `mps` as device for `Pipeline` class by fnhirwa in 30080 * Fix failing DeepSpeed model zoo tests by pacman100 in 30112 * Add datasets.Dataset to Trainer's train_dataset and eval_dataset type hints by ringohoffman in 30077 * Fix docs Pop2Piano by zucchini-nlp in 30140 * Revert workaround for TF safetensors loading by Rocketknight1 in 30128 * [Trainer] Fix default data collator by NielsRogge in 30142 * [Trainer] Undo 29896 by NielsRogge in 30129 * Fix slow tests for important models to be compatible with A10 runners by ydshieh in 29905 * Send headers when converting safetensors by ydshieh in 30144 * Fix quantization tests by SunMarc in 29914 * [docs] Fix image segmentation guide by stevhliu in 30132 * [CI] Fix setup by SunMarc in 30147 * Fix length related warnings in speculative decoding by zucchini-nlp in 29585 * Fix and simplify semantic-segmentation example by qubvel in 30145 * [CI] Quantization workflow fix by SunMarc in 30158 * [tests] make 2 tests device-agnostic by faaany in 30008 * Add str to TrainingArguments report_to type hint by ringohoffman in 30078 * [UDOP] Fix tests by NielsRogge in 29573 * [UDOP] Improve docs, add resources by NielsRogge in 29571 * Fix accelerate kwargs for versions <0.28.0 by vasqu in 30086 * Fix typing annotation in hf_argparser by xu-song in 30156 * Fixing a bug when MlFlow try to log a torch.tensor by etiennebonnafoux in 29932 * Fix natten install in docker by ydshieh in 30161 * FIX / bnb: fix torch compatiblity issue with `itemize` by younesbelkada in 30162 * Update config class check in auto factory by Rocketknight1 in 29854 * Fixed typo in comments/documentation for Pipelines documentation by DamonGuzman in 30170 * Fix Llava chat template examples by lewtun in 30130 * Guard XLA version imports by muellerzr in 30167 * chore: remove repetitive words by hugehope in 30174 * fix: Fixed `ruff` configuration to avoid deprecated configuration warning by Sai-Suraj-27 in 30179 * Refactor Cohere Model by saurabhdash2512 in 30027 * Update output of SuperPointForKeypointDetection by NielsRogge in 29809 * Falcon: make activation, ffn_hidden_size configurable by sshleifer in 30134 * Docs PR template by stevhliu in 30171 * ENH: [`CI`] Add new workflow to run slow tests of important models on push main if they are modified by younesbelkada in 29235 * Fix pipeline logger.warning_once bug by amyeroberts in 30195 * fix: Replaced deprecated `logger.warn` with `logger.warning` by Sai-Suraj-27 in 30197 * fix typo by mdeff in 30220 * fix fuyu doctest by molbap in 30215 * Fix `RecurrentGemmaIntegrationTest.test_2b_sample` by ydshieh in 30222 * Update modeling_bark.py by bes-dev in 30221 * Fix/Update for doctest by ydshieh in 30216 * Fixed config.json download to go to user-supplied cache directory by ulatekh in 30189 * Add test for parse_json_file and change typing to os.PathLike by xu-song in 30183 * fix: Replace deprecated `assertEquals` with `assertEqual` by Sai-Suraj-27 in 30241 * Set pad_token in run_glue_no_trainer.py 28534 by JINO-ROHIT in 30234 * fix: Replaced deprecated `typing.Text` with `str` by Sai-Suraj-27 in 30230 * Refactor doctest by ydshieh in 30210 * fix: Fixed `type annotation` for compatability with python 3.8 by Sai-Suraj-27 in 30243 * Fix doctest more (for `docs/source/en`) by ydshieh in 30247 * round epoch only in console by xdedss in 30237 * update github actions packages' version to suppress warnings by ydshieh in 30249 * [tests] add the missing `require_torch_multi_gpu` flag by faaany in 30250 * [Docs] Update recurrent_gemma.md for some minor nits by sayakpaul in 30238 * Remove incorrect arg in codellama doctest by Rocketknight1 in 30257 * Update `ko/_toctree.yml` by jungnerd in 30062 * More fixes for doctest by ydshieh in 30265 * FIX: Fix corner-case issue with the important models workflow by younesbelkada in 30212 * FIX: Fix 8-bit serialization tests by younesbelkada in 30051 * Allow for str versions of dicts based on typing by muellerzr in 30227 * Workflow: Update tailscale to release version by younesbelkada in 30268 * Raise relevent err when wrong type is passed in as the accelerator_config by muellerzr in 29997 * BLIP - fix pt-tf equivalence test by amyeroberts in 30258 * fix: Fixed a `raise` statement by Sai-Suraj-27 in 30275 * Fix test fetcher (doctest) + `Idefics2`'s doc example by ydshieh in 30274 * Fix SDPA sliding window compatibility by fxmarty in 30127 * Fix SpeechT5 forward docstrings by ylacombe in 30287 * FIX / AWQ: Fix failing exllama test by younesbelkada in 30288 * Configuring Translation Pipelines documents update 27753 by UtkarshaGupte in 29986 * Enable fx tracing for Mistral by zucchini-nlp in 30209 * Fix test `ExamplesTests::test_run_translation` by ydshieh in 30281 * Fix `Fatal Python error: Bus error` in `ZeroShotAudioClassificationPipelineTests` by ydshieh in 30283 * FIX: Fix push important models CI by younesbelkada in 30291 * Add token type ids to CodeGenTokenizer by st81 in 29265 * Add strategy to store results in evaluation loop by qubvel in 30267 * Upgrading to tokenizers 0.19.0 by Narsil in 30289 * Re-enable SDPA's FA2 path by fxmarty in 30070 * Fix quality Olmo + SDPA by fxmarty in 30302 * Fix donut token2json multiline by qubvel in 30300 * Fix all torch pipeline failures except one by ydshieh in 30290 * Add atol for sliding window test by fxmarty in 30303 * Fix RecurrentGemma device_map by SunMarc in 30273 * Revert "Re-enable SDPA's FA2 path by ArthurZucker in 30070)" * Do not drop mask with SDPA for more cases by fxmarty in 30311 * FIX: Fixes unexpected behaviour for Llava / LLama & AWQ Fused modules + revert 30070 at the same time by younesbelkada in 30317 Significant community contributions The following contributors have made significant changes to the library over the last release: * bozheng-hit * Add Qwen2MoE (29377) * Update model card and link of blog post. (29928) * EduardoPach * Adding Flash Attention 2 Support for GPT2 (29226) * Adding grounding dino (26087) * 2015aroras * Add OLMo model family (29890) * tomeras91 * Add jamba (29943) * abhi-mosaic * Add DBRX Model (29921) ```Links
- PyPI: https://pypi.org/project/transformers - Changelog: https://data.safetycli.com/changelogs/transformers/ - Repo: https://github.com/huggingface/transformers