mozilla / gcp-ingestion

Documentation and implementation of telemetry ingestion on Google Cloud Platform
https://mozilla.github.io/gcp-ingestion/
Mozilla Public License 2.0
75 stars 31 forks source link

Bump org.apache.beam:beam-sdks-java-google-cloud-platform-bom from 2.42.0 to 2.52.0 #2512

Closed dependabot[bot] closed 8 months ago

dependabot[bot] commented 10 months ago

Bumps org.apache.beam:beam-sdks-java-google-cloud-platform-bom from 2.42.0 to 2.52.0.

Release notes

Sourced from org.apache.beam:beam-sdks-java-google-cloud-platform-bom's releases.

Beam 2.52.0 release

We are happy to present the new 2.52.0 release of Beam. This release includes both improvements and new functionality. See the download page for this release.

For more information on changes in 2.52.0, check out the detailed release notes.

Highlights

  • Previously deprecated Avro-dependent code (Beam Release 2.46.0) has been finally removed from Java SDK "core" package. Please, use beam-sdks-java-extensions-avro instead. This will allow to easily update Avro version in user code without potential breaking changes in Beam "core" since the Beam Avro extension already supports the latest Avro versions and should handle this. (#25252).
  • Publishing Java 21 SDK container images now supported as part of Apache Beam release process. (#28120)
    • Direct Runner and Dataflow Runner support running pipelines on Java21 (experimental until tests fully setup). For other runners (Flink, Spark, Samza, etc) support status depend on runner projects.

New Features / Improvements

  • Add UseDataStreamForBatch pipeline option to the Flink runner. When it is set to true, Flink runner will run batch jobs using the DataStream API. By default the option is set to false, so the batch jobs are still executed using the DataSet API.
  • upload_graph as one of the Experiments options for DataflowRunner is no longer required when the graph is larger than 10MB for Java SDK (PR#28621.
  • state amd side input cache has been enabled to a default of 100 MB. Use --max_cache_memory_usage_mb=X to provide cache size for the user state API and side inputs. (Python) (#28770).
  • Beam YAML stable release. Beam pipelines can now be written using YAML and leverage the Beam YAML framework which includes a preliminary set of IO's and turnkey transforms. More information can be found in the YAML root folder and in the README.

Breaking Changes

  • org.apache.beam.sdk.io.CountingSource.CounterMark uses custom CounterMarkCoder as a default coder since all Avro-dependent classes finally moved to extensions/avro. In case if it's still required to use AvroCoder for CounterMark, then, as a workaround, a copy of "old" CountingSource class should be placed into a project code and used directly (#25252).
  • Renamed host to firestoreHost in FirestoreOptions to avoid potential conflict of command line arguments (Java) (#29201).

Bugfixes

  • Fixed "Desired bundle size 0 bytes must be greater than 0" in Java SDK's BigtableIO.BigtableSource when you have more cores than bytes to read (Java) #28793.
  • watch_file_pattern arg of the RunInference arg had no effect prior to 2.52.0. To use the behavior of arg watch_file_pattern prior to 2.52.0, follow the documentation at https://beam.apache.org/documentation/ml/side-input-updates/ and use WatchFilePattern PTransform as a SideInput. (#28948)
  • MLTransform doesn't output artifacts such as min, max and quantiles. Instead, MLTransform will add a feature to output these artifacts as human readable format - #29017. For now, to use the artifacts such as min and max that were produced by the eariler MLTransform, use read_artifact_location of MLTransform, which reads artifacts that were produced earlier in a different MLTransform (#29016)
  • Fixed a memory leak, which affected some long-running Python pipelines: #28246.

Security Fixes

List of Contributors

According to git shortlog, the following people contributed to the 2.52.0 release. Thank you to all contributors!

... (truncated)

Changelog

Sourced from org.apache.beam:beam-sdks-java-google-cloud-platform-bom's changelog.

[2.52.0] - 2023-11-17

Highlights

  • Previously deprecated Avro-dependent code (Beam Release 2.46.0) has been finally removed from Java SDK "core" package. Please, use beam-sdks-java-extensions-avro instead. This will allow to easily update Avro version in user code without potential breaking changes in Beam "core" since the Beam Avro extension already supports the latest Avro versions and should handle this. (#25252).
  • Publishing Java 21 SDK container images now supported as part of Apache Beam release process. (#28120)
    • Direct Runner and Dataflow Runner support running pipelines on Java21 (experimental until tests fully setup). For other runners (Flink, Spark, Samza, etc) support status depend on runner projects.

New Features / Improvements

  • Add UseDataStreamForBatch pipeline option to the Flink runner. When it is set to true, Flink runner will run batch jobs using the DataStream API. By default the option is set to false, so the batch jobs are still executed using the DataSet API.
  • upload_graph as one of the Experiments options for DataflowRunner is no longer required when the graph is larger than 10MB for Java SDK (PR#28621.
  • state amd side input cache has been enabled to a default of 100 MB. Use --max_cache_memory_usage_mb=X to provide cache size for the user state API and side inputs. (Python) (#28770).
  • Beam YAML stable release. Beam pipelines can now be written using YAML and leverage the Beam YAML framework which includes a preliminary set of IO's and turnkey transforms. More information can be found in the YAML root folder and in the README.

Breaking Changes

  • org.apache.beam.sdk.io.CountingSource.CounterMark uses custom CounterMarkCoder as a default coder since all Avro-dependent classes finally moved to extensions/avro. In case if it's still required to use AvroCoder for CounterMark, then, as a workaround, a copy of "old" CountingSource class should be placed into a project code and used directly (#25252).
  • Renamed host to firestoreHost in FirestoreOptions to avoid potential conflict of command line arguments (Java) (#29201).

Bugfixes

  • Fixed "Desired bundle size 0 bytes must be greater than 0" in Java SDK's BigtableIO.BigtableSource when you have more cores than bytes to read (Java) #28793.
  • watch_file_pattern arg of the RunInference arg had no effect prior to 2.52.0. To use the behavior of arg watch_file_pattern prior to 2.52.0, follow the documentation at https://beam.apache.org/documentation/ml/side-input-updates/ and use WatchFilePattern PTransform as a SideInput. (#28948)
  • MLTransform doesn't output artifacts such as min, max and quantiles. Instead, MLTransform will add a feature to output these artifacts as human readable format - #29017. For now, to use the artifacts such as min and max that were produced by the eariler MLTransform, use read_artifact_location of MLTransform, which reads artifacts that were produced earlier in a different MLTransform (#29016)
  • Fixed a memory leak, which affected some long-running Python pipelines: #28246.

Security Fixes

[2.51.0] - 2023-10-03

New Features / Improvements

  • In Python, RunInference now supports loading many models in the same transform using a KeyedModelHandler (#27628).
  • In Python, the VertexAIModelHandlerJSON now supports passing in inference_args. These will be passed through to the Vertex endpoint as parameters.
  • Added support to run mypy on user pipelines (#27906)
  • Python SDK worker start-up logs and crash logs are now captured by a buffer and logged at appropriate levels via Beam logging API. Dataflow Runner users might observe that most worker-startup log content is now captured by the worker logger. Users who relied on print() statements for logging might notice that some logs don't flush before pipeline succeeds - we strongly advise to use logging package instead of print() statements for logging. (#28317)

... (truncated)

Commits


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
dependabot[bot] commented 8 months ago

Superseded by #2534.