PyTorch 2.3: User-Defined Triton Kernels in torch.compile, Tensor Parallelism in Distributed
PyTorch 2.3 Release notes
Highlights
Backwards Incompatible Changes
Deprecations
New Features
Improvements
Bug fixes
Performance
Documentation
Highlights
We are excited to announce the release of PyTorch® 2.3! PyTorch 2.3 offers support for user-defined Triton kernels in torch.compile, allowing for users to migrate their own Triton kernels from eager without experiencing performance complications or graph breaks. As well, Tensor Parallelism improves the experience for training Large Language Models using native PyTorch functions, which has been validated on training runs for 100B parameter models.
This release is composed of 3393 commits and 426 contributors since PyTorch 2.2. We want to sincerely thank our dedicated community for your contributions. As always, we encourage you to try these out and report any issues as we improve 2.3. More information about how to get started with the PyTorch 2-series can be found at our Getting Started page.
... (truncated)
Commits
97ff6cf [Release only] Release 2.3 start using triton package from pypi (#123580)
Llama 3 is supported in this release through the Llama 2 architecture and some fixes in the tokenizers library.
Idefics2
The Idefics2 model was created by the Hugging Face M4 team and authored by Léo Tronchon, Hugo Laurencon, Victor Sanh. The accompanying blog post can be found here.
Idefics2 is an open multimodal model that accepts arbitrary sequences of image and text inputs and produces text outputs. The model can answer questions about images, describe visual content, create stories grounded on multiple images, or simply behave as a pure language model without visual inputs. It improves upon IDEFICS-1, notably on document understanding, OCR, or visual reasoning. Idefics2 is lightweight (8 billion parameters) and treats images in their native aspect ratio and resolution, which allows for varying inference efficiency.
Recurrent Gemma architecture. Taken from the original paper.
The Recurrent Gemma model was proposed in RecurrentGemma: Moving Past Transformers for Efficient Open Language Models by the Griffin, RLHF and Gemma Teams of Google.
The abstract from the paper is the following:
We introduce RecurrentGemma, an open language model which uses Google’s novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide a pre-trained model with 2B non-embedding parameters, and an instruction tuned variant. Both models achieve comparable performance to Gemma-2B despite being trained on fewer tokens.
Jamba is a pretrained, mixture-of-experts (MoE) generative text model, with 12B active parameters and an overall of 52B parameters across all experts. It supports a 256K context length, and can fit up to 140K tokens on a single 80GB GPU.
As depicted in the diagram below, Jamba’s architecture features a blocks-and-layers approach that allows Jamba to successfully integrate Transformer and Mamba architectures altogether. Each Jamba block contains either an attention or a Mamba layer, followed by a multi-layer perceptron (MLP), producing an overall ratio of one Transformer layer out of every eight total layers.
Jamba introduces the first HybridCache object that allows it to natively support assisted generation, contrastive search, speculative decoding, beam search and all of the awesome features from the generate API!
Introduced improvements to the CI process for enhanced performance and efficiency during builds, specifically enabling more effective cross-compilation on Linux platforms. This was accomplished by deprecating Make and migrating to Cmake, as well as implementing new corresponding workflows. Huge thanks go to @wkpark, @rickardp, @matthewdouglas and @younesbelkada; #1055, #1050, #1111.
Windows should be officially supported in bitsandbytes with pip install bitsandbytes
Updated installation instructions to provide more comprehensive guidance for users. This includes clearer explanations and additional tips for various setup scenarios, making the library more accessible to a broader audience (@rickardp, #1047).
Enhanced the library's compatibility and setup process, including fixes for CPU-only installations and improvements in CUDA setup error messaging. This effort aims to streamline the installation process and improve user experience across different platforms and setups (@wkpark, @akx, #1038, #996, #1012).
Addressed a race condition in kEstimateQuantiles, enhancing the reliability of quantile estimation in concurrent environments (@pnunna93, #1061).
Fixed various minor issues, including typos in code comments and documentation, to improve code clarity and prevent potential confusion (@nairbv, #1063).
Backwards Compatibility
After upgrading from v0.42 to v0.43, when using 4bit quantization, models may generate slightly different outputs (approximately up to the 2nd decimal place) due to a fix in the code. For anyone interested in the details, see this comment.
Internal and Build System Enhancements:
Implemented several enhancements to the internal and build systems, including adjustments to the CI workflows, portability improvements, and build artifact management. These changes contribute to a more robust and flexible development process, ensuring the library's ongoing quality and maintainability (@rickardp, @akx, @wkpark, @matthewdouglas; #949, #1053, #1045, #1037).
Contributors:
This release is made possible thanks to the many active contributors that submitted PRs and many others who contributed to discussions, reviews, and testing. Your efforts greatly enhance the library's quality and user experience. It's truly inspiring to work with such a dedicated and competent group of volunteers and professionals!
We give a special thanks to @TimDettmers for managing to find a little bit of time for valuable consultations on critical topics, despite preparing for and touring the states applying for professor positions. We wish him the utmost success!
We also extend our gratitude to the broader community for your continued support, feedback, and engagement, which play a crucial role in driving the library's development forward.
Improved the serialization format for 8-bit weights; this change is fully backwards compatible. (#1164, thanks to @younesbelkada for the contributions and @akx for the review).
Added CUDA 12.4 support to the Linux x86-64 build workflow, expanding the library's compatibility with the latest CUDA versions. (#1171, kudos to @matthewdouglas for this addition).
Docs enhancement: Improved the instructions for installing the library from source. (#1149, special thanks to @stevhliu for the enhancements).
Bug Fixes
Fix 4bit quantization with blocksize = 4096, where an illegal memory access was encountered. (#1160, thanks @matthewdouglas for fixing and @YLGH for reporting)
Introduced improvements to the CI process for enhanced performance and efficiency during builds, specifically enabling more effective cross-compilation on Linux platforms. This was accomplished by deprecating Make and migrating to Cmake, as well as implementing new corresponding workflows. Huge thanks go to @wkpark, @rickardp, @matthewdouglas and @younesbelkada; #1055, #1050, #1111.
Updated installation instructions to provide more comprehensive guidance for users. This includes clearer explanations and additional tips for various setup scenarios, making the library more accessible to a broader audience (@rickardp, #1047).
Enhanced the library's compatibility and setup process, including fixes for CPU-only installations and improvements in CUDA setup error messaging. This effort aims to streamline the installation process and improve user experience across different platforms and setups (@wkpark, @akx, #1038, #996, #1012).
Setup a new documentation at https://huggingface.co/docs/bitsandbytes/main with extensive new sections and content to help users better understand and utilize the library. Especially notable are the new API docs. (big thanks to @stevhliu and @mishig25 from HuggingFace #1012). The API docs have been also addressed in #1075.
Bug Fixes:
Addressed a race condition in kEstimateQuantiles, enhancing the reliability of quantile estimation in concurrent environments (@pnunna93, #1061).
Fixed various minor issues, including typos in code comments and documentation, to improve code clarity and prevent potential confusion (@Brian Vaughan, #1063).
Backwards Compatibility
After upgrading from v0.42 to v0.43, when using 4bit quantization, models may generate slightly different outputs (approximately up to the 2nd decimal place) due to a fix in the code. For anyone interested in the details, see this comment.
Internal and Build System Enhancements:
Implemented several enhancements to the internal and build systems, including adjustments to the CI workflows, portability improvements, and build artifact management. These changes contribute to a more robust and flexible development process, ensuring the library's ongoing quality and maintainability (@rickardp, @akx, @wkpark, @matthewdouglas; #949, #1053, #1045, #1037).
Contributors:
This release is made possible thanks to the many active contributors that submitted PRs and many others who contributed to discussions, reviews, and testing. Your efforts greatly enhance the library's quality and user experience. It's truly inspiring to work with such a dedicated and competent group of volunteers and professionals!
We give a special thanks to @TimDettmers for managing to find a little bit of time for valuable consultations on critical topics, despite preparing for and touring the states applying for professor positions. We wish him the utmost success!
#12069: A deprecation warning is now raised when implementations of one of the following hooks request a deprecated py.path.local parameter instead of the pathlib.Path parameter which replaced it:
pytest_ignore_collect{.interpreted-text role="hook"} - the path parameter - use collection_path instead.
pytest_collect_file{.interpreted-text role="hook"} - the path parameter - use file_path instead.
pytest_pycollect_makemodule{.interpreted-text role="hook"} - the path parameter - use module_path instead.
pytest_report_header{.interpreted-text role="hook"} - the startdir parameter - use start_path instead.
pytest_report_collectionfinish{.interpreted-text role="hook"} - the startdir parameter - use start_path instead.
The replacement parameters are available since pytest 7.0.0.
The old parameters will be removed in pytest 9.0.0.
See legacy-path-hooks-deprecated{.interpreted-text role="ref"} for more details.
Features
#11871: Added support for reading command line arguments from a file using the prefix character @, like e.g.: pytest @tests.txt. The file must have one argument per line.
See Read arguments from file <args-from-file>{.interpreted-text role="ref"} for details.
Improvements
#11523: pytest.importorskip{.interpreted-text role="func"} will now issue a warning if the module could be found, but raised ImportError{.interpreted-text role="class"} instead of ModuleNotFoundError{.interpreted-text role="class"}.
The warning can be suppressed by passing exc_type=ImportError to pytest.importorskip{.interpreted-text role="func"}.
See import-or-skip-import-error{.interpreted-text role="ref"} for details.
#11728: For unittest-based tests, exceptions during class cleanup (as raised by functions registered with TestCase.addClassCleanup <unittest.TestCase.addClassCleanup>{.interpreted-text role="meth"}) are now reported instead of silently failing.
#11777: Text is no longer truncated in the short test summary info section when -vv is given.
#12112: Improved namespace packages detection when consider_namespace_packages{.interpreted-text role="confval"} is enabled, covering more situations (like editable installs).
#9502: Added PYTEST_VERSION{.interpreted-text role="envvar"} environment variable which is defined at the start of the pytest session and undefined afterwards. It contains the value of pytest.__version__, and among other things can be used to easily check if code is running from within a pytest run.
Bug Fixes
#12065: Fixed a regression in pytest 8.0.0 where test classes containing setup_method and tests using @staticmethod or @classmethod would crash with AttributeError: 'NoneType' object has no attribute 'setup_method'.
Now the request.instance <pytest.FixtureRequest.instance>{.interpreted-text role="attr"} attribute of tests using @staticmethod and @classmethod is no longer None, but a fresh instance of the class, like in non-static methods.
Bumps the python-packages group with 10 updates:
1.3.0
1.3.1
2.7.1
2.8
0.20.4
0.21.3
3.1.42
3.1.43
2.2.1
2.3.0
1.13.3
1.25.0
4.38.2
4.40.1
0.42.0
0.43.1
8.0.2
8.2.0
4.1.0
5.0.0
Updates
pygls
from 1.3.0 to 1.3.1Release notes
Sourced from pygls's releases.
Changelog
Sourced from pygls's changelog.
Commits
9e27a5e
build: v1.3.179c0bcc
docs: update implementations.md with Chapel's language serverf5de769
docs: add systemd-language-server to implementations959241e
chore: apache license missing dash323dfa8
chore: update CONTRIBUTORS.mddb2233f
chore: update CHANGELOG.mdUpdates
language-tool-python
from 2.7.1 to 2.8Commits
Updates
tree-sitter
from 0.20.4 to 0.21.3Release notes
Sourced from tree-sitter's releases.
Commits
5d52ace
fix: acceptPathLike
inLanguage()
30d3660
build: enable aarch64 wheelsce1af66
feat: support Python 3.12 again52f29fa
docs: add pypi badgeb33091c
ci(pypi): fix GitHub release stepf48b92f
ci(pypi): explicitly set up python55fde88
build: update submodulesf1d4b86
build: add keywords59e54ff
ci: drop test releasese9c956c
docs: improve examples and add usage fileUpdates
gitpython
from 3.1.42 to 3.1.43Release notes
Sourced from gitpython's releases.
Commits
5364053
bump version to 3.1.434e626bd
Merge pull request #1886 from EliahKagan/deprecation-warningsf6060df
Add GitMeta alias8327b45
Test GitMeta aliasf92f4c3
Clarify security risk in USE_SHELL doc and warningsc7675d2
update security policy, to use GitHub instead of emailcf2576e
Make/use test.deprecation.lib; abandon idea to filter by module7cd3aa9
Make test.performance.lib docstring more specificb51b080
Explain the approach in test.deprecation to static checkingbdabb21
Expand USE_SHELL docstring; clarify a test usageUpdates
torch
from 2.2.1 to 2.3.0Release notes
Sourced from torch's releases.
... (truncated)
Commits
97ff6cf
[Release only] Release 2.3 start using triton package from pypi (#123580)fb38ab7
Fix for MPS regression in #122016 and #123178 (#123385)23961ce
[Release/2.3] Set py3.x build-environment name consistently (#123446)634cf50
[Wheel] Change libtorch_cpu OpenMP search path (#123417) (#123442)12d0e69
update submodule onnx==1.16.0 (#123387)38acd81
[MPS] Fwd-fix for clamp regression (#122148) (#123383)b197f54
Use numpy 2.0.0rc1 in CI (#123356)dc81d19
[CI] Test that NumPy-2.X builds are backward compatible with 1.X (#123354)108305e
Upgrade submodule pybind to 2.12.0 (#123355)a8b0091
Make PyTorch compilable against upcoming Numpy-2.0 (#121880) (#123380)Updates
openai
from 1.13.3 to 1.25.0Release notes
Sourced from openai's releases.
... (truncated)
Changelog
Sourced from openai's changelog.
... (truncated)
Commits
ec52e89
release: 1.25.0d2738d4
feat(api): delete messages (#1388)11460b5
release: 1.24.1 (#1386)39845c7
release: 1.24.0155d0de
chore(client): log response headers in debug mode (#1383)a669541
feat(api): add required tool_choice (#1382)ffa8483
chore(internal): minor reformatting (#1377)4a0f0fa
chore(internal): reformat imports (#1375)e972439
release: 1.23.6 (#1372)290e7ad
release: 1.23.5 (#1369)Updates
transformers
from 4.38.2 to 4.40.1Release notes
Sourced from transformers's releases.
... (truncated)
Commits
9fe3f58
v4.40.1f8fec6b
Make EosTokenCriteria compatible with mps (#30376)745bbfe
Release: v4.40.05728b5a
FIX: Fixes unexpected behaviour for Llava / LLama & AWQ Fused modules + rever...005b957
Add DBRX Model (#29921)63c5e27
Do not drop mask with SDPA for more cases (#30311)acab997
Revert "Re-enable SDPA's FA2 path (#30070)" (#30314)7509a0a
Fix RecurrentGemma device_map (#30273)9459efb
Add atol for sliding window test (#30303)3f20877
Add jamba (#29943)Updates
bitsandbytes
from 0.42.0 to 0.43.1Release notes
Sourced from bitsandbytes's releases.
Changelog
Sourced from bitsandbytes's changelog.
... (truncated)
Commits
4a6fb35
bump version to 0.43.1f92c536
CHANGELOG: add v0.43.10c33c0d
ignore CHANGELOG reordering + formatting commit4743ff0
CHANGELOG: to reverse chron order + mdformat7449d71
[Core
] Change 8-bit serialization weight format format (#1164)c54053d
Bump scipy from 1.12.0 to 1.13.0 in the minor-patch group (#1170)6be3d0f
[docs] Install from source (#1149)0c887b7
Merge pull request #1169 from TimDettmers/dependabot/pip/major-45b123642daf9a073
Merge pull request #1171 from matthewdouglas/build-cu124ebac862
Exclude Windows from CUDA 12.4.0 build for nowUpdates
pytest
from 8.0.2 to 8.2.0Release notes
Sourced from pytest's releases.
... (truncated)
Commits
6bd3f31
Tweak changelog for 8.2.09b6219b
Prepare release version 8.2.0