spotify / scio

A Scala API for Apache Beam and Google Cloud Dataflow.
https://spotify.github.io/scio
Apache License 2.0
2.56k stars 513 forks source link

Wrap coders in Zstd coders #4882

Open kellen opened 1 year ago

kellen commented 1 year ago

In theory this will reduce shuffle/streaming data processed cost

Needs a dictionary of common symbols to be provided

RustedBones commented 1 year ago

We should make sure the CPU/speed cost for that remains low

RustedBones commented 3 months ago

Fixed in https://github.com/spotify/scio/pull/5321