matanolabs / matano

Open source security data lake for threat hunting, detection & response, and cybersecurity analytics at petabyte scale on AWS
https://matano.dev
Apache License 2.0
1.44k stars 97 forks source link

feat: transformer: sideline partially erroring lines #121

Closed Samrose-Ahmed closed 1 year ago

Samrose-Ahmed commented 1 year ago

This adds a sidelining feature to the transformer, where if the transformer script fails for e.g. one line of say 1000 incoming lines for a specific file, the error will not be swallowed, but rather sidelined to a special 'sideline bucket' (similarish to Firehose error output).

Notes

Testing

e.g.

ERROR transformer: Line error: Line err: SchemaMismatchError, msg: Failed to resolve schema due to schema mismatch.    
INFO transformer: Sidelining 1 failed lines    
INFO transformer: Sidelining 1 lines to s3://matanodpmainstack-transformersidelinebucketba3289-158l9z4o32vax/google_workspace/SchemaMismatchError/2023-03-24-23/8fee65a4-3da4-4e01-8aaf-dc247c10c131.json.zst  
Samrose-Ahmed commented 1 year ago

draftish, @shaeqahmed review this one

timoguin commented 1 year ago

Hey, @shaeqahmed. He said review this one.