matanolabs / matano

Open source security data lake for threat hunting, detection & response, and cybersecurity analytics at petabyte scale on AWS
https://matano.dev
Apache License 2.0
1.46k stars 100 forks source link

Managed log source for AWS VPC Flow logs #49

Closed timoguin closed 1 year ago

timoguin commented 1 year ago

Add support for managing VPC Flow logs.

Considerations

Flow logs can now be published to Kinesis Firehose. We should look at implementing the current transformer Lambda in a way that it can be utilized by a Firehose Delivery Stream for transformation. This is probably as close to real-time as we can get. It would allow us to handle the normalization and delivery to S3 (in parquet) in one step, bypassing the data batcher Lambda.

Flow logs can also be published directly to S3 in parquet format. Since we already need to do record-based ECS normalization, these may not be as useful as the text-based delivery. Now that I think of it, would it actually be useful to process the raw parquet files simply due to the smaller file sizes? It would also save on up-front storage costs for the ingestion buckets. Flow logs get big fast.

Flow logs can also be delivered to CloudWatch Logs, although anyone with real volume is probably not doing this because CloudWatch Logs gets expensive. However, streaming CloudWatch Logs to Firehose is a rather nice experience when such high volumes are not a concern.

Tasks

References

shaeqahmed commented 1 year ago

Added in #70 with support for ingesting AWS VPC Flow logs in the Text-based format