mozilla-services / foxsec-pipeline

Log analysis pipeline utilizing Apache Beam
Mozilla Public License 2.0
25 stars 9 forks source link

add gbk+ungroup to avoid fusion in hard limit analysis #477

Closed kkleemola closed 4 years ago

kkleemola commented 4 years ago

follow up to #476

Dataflow fuses the steps together so adding an extra step doesn't actually help unless we also add something to prevent fusion. There's a few options to prevent this: https://cloud.google.com/dataflow/docs/guides/deploying-a-pipeline#fusion-optimization

Adding a reshuffle step would be a slightly nicer looking alternative but it is marked as deprecated by beam.