obsrvbl-oss / flowlogs-reader

Command line tool and Python library for working with AWS VPC Flow Logs
Apache License 2.0
136 stars 25 forks source link

allow filtering logs by interface name ("eni-..") to improve performance #16

Closed hjacobs closed 8 years ago

hjacobs commented 8 years ago

In my use case I just want to read flow logs of specific network interfaces (e.g. only for public instances and/or ELBs).

By adding the "interfaces" (list/set of EC2 network interface IDs) keyword argument, the log reader will filter the log streams appropriately.

bbayles commented 8 years ago

@hjacobs, thanks for the PR. I think we can use the logStreamNamePrefix to do this more elegantly. I have a branch that I'm working on; I will incorporate your desired functionality into that.

hjacobs commented 8 years ago

Actually I also see problems with my PR and the "describe_log_streams" API call as it has a very low rate limit (max 5 req/sec).

bbayles commented 8 years ago

@hjacobs , could you try the version in this branch to see if it's close to what you had in mind?

The LogStreamNamePrefix filter only allows for one item rather than multiple, but it should be a bit more efficient than filtering in Python.

hjacobs commented 8 years ago

My use case needs filtering by multiple interfaces, see my hack here: https://github.com/zalando-stups/connection-tracker/blob/master/scan.py#L132 (disclaimer: crappy hack started just yesterday ;-) )

Note that I also monkey patch the boto client as I'm using STS assume_role to get a new session.

bbayles commented 8 years ago

@hjacobs , I'll probably close this given that there are merge conflicts and coverage issues. I think filtering by LogStreamNamePrefix fits nicely with this library, but the using the multiple interfaces seems more suited to subclassing or patching.

hjacobs commented 8 years ago

OK, no problem, I'm not filtering by network interface anymore :smile: