Closed jrossi closed 5 years ago
I think adding pubsub somewhere would be a good idea for us to support a more pluggable pipeline. It might reqiure some difficult refactoring before we can integrate it, but it's likely worth doing it.
I'd love to see an abstraction like this somewhere https://github.com/NYTimes/gizmo/tree/master/pubsub
@marpaia, @zwass and I have all discussed various strategies for how to integrate something like that in fleet. Idk if we have something written down at the moment but would love to discuss an approach.
(any POC would be great too).
I have a much better understand of how Fleet works, and how the code is put together after doing my POC in #1899. With this I would like to proposal the following as a method for integration of external data sinks for osquery results/status.
Rather then adding more server configuration that needs to be completed on hosts running fleet. It would be simpler to add the routing of data to sinks using apiVersioned yaml specification.
This would keep the server configuration very simple while also allowing a simple path users that upgrade. Here is an example of what the config would look like.
apiVersion: v1
kind: output
spec:
name: windows_kafka_queue
descriptions: All windows hosts request pro
type: kafka
kafka:
topic: fleet.windows.results
brokers: 127.0.0.1:9018
match:
- type: results
- platform: windows
---
apiVersion: v1
kind: output
spec:
name: unix_nats_queue
descriptions: All
type: kafka
kafka:
topic: fleet.unix.{.PlatForm}
brokers: 127.0.0.1:9018
match:
- type: results
- platform: !windows
---
apiVersion: v1
kind: output
spec:
name: local_results
descriptions: All
type: file
file:
path: /tmp/fleet_results
log_rotate: false
match:
- type: results
---
apiVersion: v1
kind: output
spec:
name: local_status
descriptions: All
type: file
file:
path: /tmp/fleet_status
log_rotate: false
match:
- type: status
I think we should go back and forth on the file format here, but I love the idea of defining logging pipeline details with the file format! Great idea, @jrossi.
It's not clear to me that Fleet should be in the business of sophisticated routing to different output streams. @jrossi can you give some examples of other projects that do sophisticated routing like this (vs. simple output and handling complex routing in the logging pipeline)?
@zwass sure. The one the comes mind is different parts of the same org consuming things in different ways. Security teams want to know the results and status messages for knowing the state of the world. While desktop support needs the data in splunk, but the desktop engineering wants to push the data into sql server to build global list of all listening ports. I know of 5-6 different engineering teams in my company that would consume just a sub-set of the data.
While I understand the question is seams best to do the routing where all the state information is stored. Otherwise we have to expand the results with the host metadata from fleets database into the results and send them on. With the host metadata added effective filtering can happen outside of fleet. While doable does not seam like the best choice, but I could be missing some or a better way.
Here is my companies use case: Currently I am also working on building pull request model for our internal teams to define scheldued queries and have them role out in groups of hosts using fleetctl and labels. With routing defined as I proposed I would also be let them consume the data they wanted in a manor they wanted.
In the future if all this makes sense is how to effective route results results name of schleuded queries so results can be filtered even more. Think something like the following, but I have not gotten that far into this yet.
match:
query-name-startwith: desktop-suppory
Or
match:
query-name:
- secops.*
- ir.*
Here is what I would think would be a sane way forward:
Given this is a rather large number of changes and dependent on each other how would you like me to move forward?
Hi, is any work ongoing with this?
If not, I'd like to implement a very simple Google PubSub publisher (no routing/filtering logic).
Hi @therealmik. I just pushed support for logging to AWS Firehose in #2022. There is no work ongoing to enabling routing/matching for output plugins. You could totally follow the patterns introduced in that PR to add support for GCP pubsub.
That looks great - it should be pretty easy to write with all the hard work done already :)
FYI, the code is written, I'm just doing some testing before creating the PR
Hi @zwass - the filesystem module is changed from previous releases - it no longer writes newlines between messages in the results log.
On this line: https://github.com/kolide/fleet/pull/2022/files#diff-b016735b4714c1a8b7942556a87e4b68R205
The append must be allocating new buffers almost always, so the append is lost. I'm not really sure if the newline is needed for firehose (it isn't for pubsub) - if not, maybe move it into the filesystem logger anyway?
@therealmik Great find there. Another user reported the symptoms of that problem as well. That was a mistake on my part. I just put up https://github.com/kolide/fleet/pull/2029 to fix this. The newline is needed in both filesystem and Firehose. These new changes ought to work appropriately with pubsub.
Awesome, that patch looks good to me.
I am going to close this as we now have both AWS Firehose and GCP Pubsub as logging output plugins, and the pattern is established for adding new ones (see https://github.com/kolide/fleet/pull/2022 and https://github.com/kolide/fleet/pull/2049).
I am about to spend to much time on planes and as PoC I am planning on adding osquery results logging directly into fleet. This would allow the results to be processed downstream and outside of fleet.
Normally I would just reach for kafka and move on, but after looking at the code base I think NATs https://github.com/nats-io/gnatsd might be a slightly better option. Here is why:
With that being said ignoring kafka is crazy, but for a PoC I think simple is better.
Sent with GitHawk