We are happy to present the new 2.50.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.
Hugging Face Model Handler for RunInference added to Python SDK. (#26632)
Hugging Face Pipelines support for RunInference added to Python SDK. (#27399)
Vertex AI Model Handler for RunInference now supports private endpoints (#27696)
MLTransform transform added with support for common ML pre/postprocessing operations (#26795)
Upgraded the Kryo extension for the Java SDK to Kryo 5.5.0. This brings in bug fixes, performance improvements, and serialization of Java 14 records. (#27635)
All Beam released container images are now multi-arch images that support both x86 and ARM CPU architectures. (#27674). The multi-arch container images include:
All versions of Go, Python, Java and Typescript SDK containers.
All versions of Flink job server containers.
Java and Python expansion service containers.
Transform service controller container.
Spark3 job server container.
Added support for batched writes to AWS SQS for improved throughput (Java, AWS 2).(#21429)
Breaking Changes
Python SDK: Legacy runner support removed from Dataflow, all pipelines must use runner v2.
Python SDK: Dataflow Runner will no longer stage Beam SDK from PyPI in the --staging_location at pipeline submission. Custom container images that are not based on Beam's default image must include Apache Beam installation.(#26996)
Deprecations
The Go Direct Runner is now Deprecated. It remains available to reduce migration churn.
In Python, the VertexAIModelHandlerJSON now supports passing in inference_args. These will be passed through to the Vertex endpoint as parameters.
Added support to run mypy on user pipelines (#27906)
Breaking Changes
Removed fastjson library dependency for Beam SQL. Table property is changed to be based on jackson ObjectNode (Java) (#24154).
Removed TensorFlow from Beam Python container images PR. If you have been negatively affected by this change, please comment on #20605.
Removed the parameter t reflect.Type from parquetio.Write. The element type is derived from the input PCollection (Go) (#28490)
Refactor BeamSqlSeekableTable.setUp adding a parameter joinSubsetType. #28283
Deprecations
X behavior is deprecated and will be removed in X versions (#X).
Bugfixes
Fixed exception chaining issue in GCS connector (Python) (#26769).
Fixed streaming inserts exception handling, GoogleAPICallErrors are now retried according to retry strategy and routed to failed rows where appropriate rather than causing a pipeline error (Python) (#21080).
Fixed a bug in Python SDK's cross-language Bigtable sink that mishandled records that don't have an explicit timestamp set: #28632.
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.
Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
Bumps org.apache.beam:beam-sdks-java-google-cloud-platform-bom from 2.42.0 to 2.51.0.
Release notes
Sourced from org.apache.beam:beam-sdks-java-google-cloud-platform-bom's releases.
... (truncated)
Changelog
Sourced from org.apache.beam:beam-sdks-java-google-cloud-platform-bom's changelog.
... (truncated)
Commits
cd653e3
Set version for 2.51.0 RC12420c90
Cherry picking PR #28618 into 2.51.0 (setting numShards for Python BigQuery x...70f4a1a
CP for #28624 into release 2.51.0 (Bigtable Python timestamp bug fix) (#28634)34ff286
Merge pull request #28658: [release-2.51.0] Cherrypick #28571 to release branch.94adacd
Use a single marker for Vertex AI tests to not run them twice.22abcde
Merge pull request #28628: [Cherry-pick #28625 for 2.51.0] Update Python base...042ac52
Update Python container image deps in preparation for 2.51.06306b21
Merge pull request #28594: [Release-2.51.0] Cherry pick #28588: Fix sync it f...693380f
Fix sync it (#28588)20e11a6
Set Dataflow container to release version.Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting
@dependabot rebase
.Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show