We are excited to announce the release of PyTorch® 2.5! This release features a new CuDNN backend for SDPA, enabling speedups by default for users of SDPA on H100s or newer GPUs. As well, regional compilation of torch.compile offers a way to reduce the cold start up time for torch.compile by allowing users to compile a repeated nn.Module (e.g. a transformer layer in LLM) without recompilations. Finally, TorchInductor CPP backend offers solid performance speedup with numerous enhancements like FP16 support, CPP wrapper, AOT-Inductor mode, and max-autotune mode.
This release is composed of 4095 commits from 504 contributors since PyTorch 2.4. We want to sincerely thank our dedicated community for your contributions. As always, we encourage you to try these out and report any issues as we improve 2.5. More information about how to get started with the PyTorch 2-series can be found at our Getting Started page.
As well, please check out our new ecosystem projects releases with TorchRec and TorchFix.
Beta
Prototype
CuDNN backend for SDPA
FlexAttention
torch.compile regional compilation without recompilations
Compiled Autograd
TorchDynamo added support for exception handling & MutableMapping types
Flight Recorder
TorchInductor CPU backend optimization
Max-autotune Support on CPU with GEMM Template
TorchInductor on Windows
FP16 support on CPU path for both eager mode and TorchInductor CPP backend
Autoload Device Extension
Enhanced Intel GPU support
*To see a full list of public feature submissions click here.
BETA FEATURES
[Beta] CuDNN backend for SDPA
The cuDNN "Fused Flash Attention" backend was landed for torch.nn.functional.scaled_dot_product_attention. On NVIDIA H100 GPUs this can provide up to 75% speed-up over FlashAttentionV2. This speedup is enabled by default for all users of SDPA on H100 or newer GPUs.
[Beta] torch.compile regional compilation without recompilations
Regional compilation without recompilations, via torch._dynamo.config.inline_inbuilt_nn_modules which default to True in 2.5+. This option allows users to compile a repeated nn.Module (e.g. a transformer layer in LLM) without recompilations. Compared to compiling the full model, this option can result in smaller compilation latencies with 1%-5% performance degradation compared to full model compilation.
This feature advances Inductor’s CPU backend optimization, including CPP backend code generation and FX fusions with customized CPU kernels. The Inductor CPU backend supports vectorization of common data types and all Inductor IR operations, along with the static and symbolic shapes. It is compatible with both Linux and Windows OS and supports the default Python wrapper, the CPP wrapper, and AOT-Inductor mode.
Additionally, it extends the max-autotune mode of the GEMM template (prototyped in 2.5), offering further performance gains. The backend supports various FX fusions, lowering to customized kernels such as oneDNN for Linear/Conv operations and SDPA. The Inductor CPU backend consistently achieves performance speedups across three benchmark suites—TorchBench, Hugging Face, and timms—outperforming eager mode in 97.5% of the 193 models tested.
PROTOTYPE FEATURES
[Prototype] FlexAttention
We've introduced a flexible API that enables implementing various attention mechanisms such as Sliding Window, Causal Mask, and PrefixLM with just a few lines of idiomatic PyTorch code. This API leverages torch.compile to generate a fused FlashAttention kernel, which eliminates extra memory allocation and achieves performance comparable to handwritten implementations. Additionally, we automatically generate the backwards pass using PyTorch's autograd machinery. Furthermore, our API can take advantage of sparsity in the attention mask, resulting in significant improvements over standard attention implementations.
Torchvision is further extending its encoding/decoding capabilities. For this version, we added a WEBP decoder, and a batch JPEG decoder on CUDA GPUs, which can lead to 10X speed-ups over CPU decoding.
We have also improved the UX of our decoding APIs to be more user-friendly. The main entry point is now torchvision.io.decode_image(), and it can take as input either a path (as str or pathlib.Path), or a tensor containing the raw encoded data.
We also added support for HEIC and AVIF decoding, but these are currently only available when building from source. We are working on making those available directly in the upcoming releases. Stay tuned!
Detailed changes
Bug Fixes
[datasets] Update URL of SBDataset train_noval (#8551)
[datasets] EuroSAT: fix SSL certificate issues (#8563)
[io] Check average_rate availability in video reader (#8548)
New Features
[io] Add batch JPEG GPU decoding (decode_jpeg()) (#8496)
[io] Add WEBP image decoder: decode_image(), decode_webp() (#8527, #8612, #8610)
[io] Add HEIC and AVIF decoders, only available when building from source (#8597, #8596, #8647, #8613, #8621)
Improvements
[io] Add support for decoding 16bits png (#8524)
[io] Allow decoding functions to accept the mode parameter as a string (#8627)
[io] Allow decode_image() to support paths (#8624)
[io] Automatically send video to CPU in io.write_video (#8537)
[datasets] Better progress bar for file downloading (#8556)
[datasets] Add Path type annotation for ImageFolder (#8526)
[ops] Register nms and roi_align Autocast policy for PyTorch Intel GPU backend (#8541)
[transforms] Use Sequence for parameters type checking in transforms.RandomErase (#8615)
[transforms] Support v2.functional.gaussian_blur backprop (#8486)
[transforms] Expose transforms.v2 utils for writing custom transforms. (#8670)
[utils] Fix f-string in color error message (#8639)
[packaging] Revamped and improved debuggability of setup.py build (#8535, #8581, #8581, #8582, #8590, #8533, #8528, #8659)
[Documentation] Various documentation improvements (#8605, #8611, #8506, #8507, #8539, #8512, #8513, #8583, #8633)
[tests] Various tests improvements (#8580, #8553, #8523, #8617, #8518, #8579, #8558, #8617, #8641)
[code quality] Various code quality improvements (#8552, #8555, #8516, #8526, #8602, #8615, #8639, #8532)
[ci] #8562, #8644, #8592, #8542, #8594, #8530, #8656
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.
Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency
- `@dependabot ignore major version` will close this group update PR and stop Dependabot creating any more for the specific dependency's major version (unless you unignore this specific dependency's major version or upgrade to it yourself)
- `@dependabot ignore minor version` will close this group update PR and stop Dependabot creating any more for the specific dependency's minor version (unless you unignore this specific dependency's minor version or upgrade to it yourself)
- `@dependabot ignore ` will close this group update PR and stop Dependabot creating any more for the specific dependency (unless you unignore this specific dependency or upgrade to it yourself)
- `@dependabot unignore ` will remove all of the ignore conditions of the specified dependency
- `@dependabot unignore ` will remove the ignore condition of the specified dependency and ignore conditions
Bumps the torch group in /requirements with 2 updates: torch and torchvision.
Updates
torch
from 2.4.1 to 2.5.0Release notes
Sourced from torch's releases.
... (truncated)
Commits
32f585d
[Release only] use triton 3.1.x from pypi (#137895)417a076
[split build] move periodic split builds into own concurrency group (#135510)...119e734
[RELEASE-ONLY CHANGES] Fix dependency on filesystem on Linux (#137242)783a6a4
[MPS] Add regression test forfft.fftfreq
(#137215)5375201
[MPS] Add missing dispatch to rshift.Tensor (#137212)1de132e
[MPS] Fix 5D+ reductions over negative dimentions (#137211)0b1b609
[NCCL] Don't overridewaitUntilInitialized
's setting of `comm->initialized_...0b45af9
Fix addmm silent correctness on aarch64 (#137208)1a0b166
[ONNX] Add assertion nodes to ignoring list (#137214)3a541ef
Clarify thatlibtorch
API is C++17 compatible (#137206)Updates
torchvision
from 0.19.1 to 0.20.0Release notes
Sourced from torchvision's releases.
... (truncated)
Commits
afc54f7
Remove lint job for release 0.20 (#8674)8e8a208
[Cherry-pick for 0.20] Expose transforms.v2 utils for writing custom transfor...2d8a288
[Cherry-Pick for 0.20] Revamp decoding docs (#8633) (#8666)7f4d561
Remove prototype from release/0.20 branch (#8657)4a94962
Use@release/2
.5 instead of@main
for CI jobs (#8646)db5f8a0
Fix compile with nvjpeg on Windows CUDA 12 (#8641)00e7fa1
Fix f-string in color error message (#8639)838ad6c
Allow decoding functions to accept the mode parameter as a string (#8627)d0ebeb5
Allow decode_image to support paths (#8624)c36025a
Remove unwanted printf in avif decoder (#8621)Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting
@dependabot rebase
.Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show