mozilla-services / foxsec-pipeline

Log analysis pipeline utilizing Apache Beam
Mozilla Public License 2.0
25 stars 9 forks source link

refactor deduplication for aws cross account correlator #499

Closed kkleemola closed 3 years ago

kkleemola commented 3 years ago

follow up to #496

496 resulted in a non updateable graph because of the deduplication transform. I also tried switching it to streaming engine to see as recommended by the error message, but that also failed with no further details, so this simply moves the deduplication to after we've windowed and GBK.

Because we deduplicate after this, it means if the first event we receive is duplicated, it can take a long time to generate the alert as it'll won't happen on the early firing and instead occurs when the window closes and in practice we need to set the session gap to be ~8 minutes. But since the duplicate event issue has been fixed and this is an edge case I don't think the delay is concerning.