kolide / fleet

A flexible control server for osquery fleets
https://kolide.com/fleet
MIT License
1.1k stars 261 forks source link

Feature Request: Add logging from Fleet into S3 Bucket #2206

Open alejandroortuno opened 4 years ago

alejandroortuno commented 4 years ago

This is a feature request to add results logging from Kolide directly into an S3 bucket with a folder structure based on pack / query_name. This cannot be achieved with Kinesis Firehose as delivery streams will forward all logs indistinctly into the same folder within the S3 bucket configured as destination.

Having this structure based mainly on query name on the same folder, allows to have data processing tools to consume logs with the same schema directly on the S3 bucket and transfer the complexity of segregating them on the source of the logs in this case, the fleet server.

As it is right now, Kinesis will forward the result logs with different schema all under the same folder and requires additional processing to segregate logs into folder based on same schema (osquery query).

zwass commented 4 years ago

My recommendation for this would be to use something like Logstash to pull the logs from S3 and split them by query.

We could look into supporting Kinesis Streams in addition to Firehose which would help get the logs directly into Logstash rather than having to read them out of S3.

alejandroortuno commented 4 years ago

@zwass tools like AWS Athena will work directly with the logs on S3 (read it from them) so that is why having a Kolide Fleet logger directly on S3 is beneficial and to avoid some of those further processing pulling the logs from S3 again just to re-upload them to S3 to split them into S3 folders by query?

The idea is to have folder structure on S3 so tools that parse logs directly from S3 can make the schema work per query. You can see some comments on the Kolide Slack channel of people that have tried this with no success purely based on the lack of folder structure on S3 as the different query result logs will have a different schema on the "columns" key:

https://osquery.slack.com/archives/C08V7KTJB/p1550659411094500