matanolabs / matano

Open source security data lake for threat hunting, detection & response, and cybersecurity analytics at petabyte scale on AWS
https://matano.dev
Apache License 2.0
1.46k stars 100 forks source link

Transformer function(s) for Kinesis Firehose #54

Closed timoguin closed 1 year ago

timoguin commented 1 year ago

Allow running transformations as part of Kinesis Firehose delivery streams.

Considerations

Many AWS services still will only deliver logs to CloudWatch Logs. Some can send directly to Kinesis Firehose.

Anything sent to CloudWatch Logs can be streamed from there to Kinesis Firehose.

The transformation features of Firehose are quite nice to work with, and they would allow skipping the data batcher logic required for processing files from S3. Firehose also manages retries and other logic, and it adds a number of additional points of observability.

This will require planning and design work. I'm not sure if we'd want to make the current transformer Lambda more generic so it can process more triggers, or if we'd want to implement something slightly different for the Firehose use case. My guess is it'd be better to support more triggers / sources for the current transformer.

Tasks

timoguin commented 1 year ago

After discussion, the consensus is that we don't want to add this complexity to the current transformer logic, especially for a use-case that is specific to AWS.

Instead, we can use Firehose to deliver directly to S3 without transformation. That will keep things simple and still allow us to achieve the goal of ingesting data from Firehose.

Any sources coming from Firehose will need to handle decoding from base64, which is straightforward to do with VRL.