Note that Java (maven) and training (pypi) packages are delayed from package manager release due to some publishing errors. Feel free to contact @maanavd if you need release candidates for some workflows ASAP. In the meantime, binaries are attached to this post. This message will be deleted once this ceases to be the case. Thanks for your understanding :)
Build System & Packages
Numpy support for 2.x has been added
Qualcomm SDK has been upgraded to 2.25
ONNX has been upgraded from 1.16 → 1.16.1
Default GPU packages use CUDA 12.x and Cudnn 8.x (previously CUDA 11.x/CuDNN 8.x) CUDA 11.x/CuDNN 8.x packages are moved to the aiinfra VS feed.
TensorRT 10.2 support added
Introduced Java CUDA 12 packages on Maven.
Discontinued support for Xamarin. (Xamarin reached EOL on May 1, 2024)
Discontinued support for macOS 11 and increasing the minimum supported macOS version to 12. (macOS 11 reached EOL in September 2023)
Discontinued support for iOS 12 and increasing the minimum supported iOS version to 13.
Core
Implemented DeformConv
Performance
Added QDQ support for INT4 quantization in CPU and CUDA Execution Providers
Implemented FlashAttention on CPU to improve performance for GenAI prompt cases
Improved INT4 performance on CPU (X64, ARM64) and NVIDIA GPUs
Execution Providers
TensorRT
Updated to support TensorRT 10.2
Remove calls to deprecated api’s
Enable refittable embedded engine when ONNX model provided as byte stream
CUDA
Added support for building with CUDA 12.5.
Upgraded cutlass to 3.5.0 for performance improvement of memory efficient attention.
Updated MultiHeadAttention and Attention operators to be thread-safe.
Added sdpa_kernel provider option to choose kernel for Scaled Dot-Product Attention.
Expanded op support - Tile (bf16)
CPU
Expanded op support - GroupQueryAttention, SparseAttention (for Phi-3 small)
QNN
Updated to support QNN SDK 2.25
Expanded op support - HardSigmoid, ConvTranspose 3d, Clip (int32 data), Matmul (int4 weights), Conv (int4 weights), prelu (fp16)
Expanded fusion support – Conv + Clip/Relu fusion
OpenVINO
Added support for OpenVINO 2024.3
Support for enabling EpContext using session options
DirectML
Updated DirectML from 1.14.1 → 1.15
... (truncated)
Commits
530a2d7 Enable FP16 Clip and Handle Bias in FP16 Depthwise Conv (#21493)
82036b0 Remove references to the outdated CUDA EP factory method (#21549)
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.
Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
Bumps onnxruntime-node from 1.18.0 to 1.19.0.
Release notes
Sourced from onnxruntime-node's releases.
... (truncated)
Commits
530a2d7
Enable FP16 Clip and Handle Bias in FP16 Depthwise Conv (#21493)82036b0
Remove references to the outdated CUDA EP factory method (#21549)07d3be5
CoreML: Add ML Program Split Op (#21456)5d78b9a
[TensorRT EP] Update TRT OSS Parser to 10.2 (#21552)8417c32
Keep QDQ nodes w/ nonpositive scale around MaxPool (#21182)d985814
Update labeling bot (#21548)7543dd0
Propagate NaNs in the CPU min and max operators (#21492)c39f1c4
ORT- OVEP 1.19 PR-follow up (#21546)b03c949
[js/web] allow load WebAssembly binary from buffer (#21534)0d7cf30
[js/webgpu] Add activation Tanh (#21540)Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting
@dependabot rebase
.Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show