matanolabs / matano

Open source security data lake for threat hunting, detection & response, and cybersecurity analytics at petabyte scale on AWS
https://matano.dev
Apache License 2.0
1.46k stars 100 forks source link

Managed log source for AWS S3 access logs #50

Closed timoguin closed 1 year ago

timoguin commented 1 year ago

Add support for managing AWS S3 access logs.

Considerations

S3 access logs are an odd man out for a few reasons:

Tasks

References

Samrose-Ahmed commented 1 year ago

Just shipped the parser.🚀

Can you elaborate on the cross account/role note?

From what I can tell, S3 access logs doesn't support cross account access, in that the target and source bucket have to be in the same account. They seem to reccomend CloudTrail data logs for that case.

timoguin commented 1 year ago

Just shipped the parser.🚀

Nice!

Can you elaborate on the cross account/role note?

Yes, S3 access logs can only be delivered when the source and destination bucket exist within the same account and the same region.

I'm talking about what you have to do if you want to process logs in a bucket that exists in another account.

S3 access logs are unique in the sense that you need to assume a cross account role and then copy the objects while modifying the object ownership.

I'd need to look back into the specifics to elaborate further. It's been several years since I've done it. This is all from memory.

The main point is that if we want to allow Matano to process these logs from a bucket in another account, a cross-account S3 bucket policy isn't enough. We would at least need to do some UX planning to make it easy.

For now I would say ship with initial support for buckets in the same account Matano is running in. Then we can follow up to add mechanisms for processing buckets in another account.

From what I can tell, S3 access logs doesn't support cross account access, in that the target and source bucket have to be in the same account. They seem to reccomend CloudTrail data logs for that case.

Logging S3 Data Events with CloudTrail gets really expensive quickly at scale. You will have pain if you enable them on high-traffic buckets. They'll blow the CloudTrail budget out of the water fast.

At this point I would only ever turn them on for small, high risk buckets that don't see any serious production traffic.

Normal S3 access logs are free though, and they still contain a ton of valuable data.

Samrose-Ahmed commented 1 year ago