mozilla-services / hindsight

Hindsight - light weight data processing skeleton
Mozilla Public License 2.0
668 stars 55 forks source link

Relationship and compatibility with mozdef. #198

Open rverma-jm opened 4 years ago

rverma-jm commented 4 years ago

Wondering how this project can be leveraged in Mozdef https://github.com/mozilla/MozDef. Can we use hindsight to replace the internal RabbitMQ requirements for mozdef and deliver the logs directly to s3.

trink commented 4 years ago

It would not be a drop in replacement at this point.

If the goal is to have the ability to do some real time analysis/monitoring/alerting on the data in addition to the ETL then this would be beneficial. If the goal is just to remove rabbitMQ from the pipeline it can be done but using Hindsight to do it would be a bit overkill.

rverma-jm commented 4 years ago

The idea is that hindsight seems a quiet performant stream processing layer as you mentioned. Even the mozdef docs said

MozDef aims to provide traditional SIEM functionality including:

Accepting events/logs from a variety of systems. (1)
Storing events/logs. 
Facilitating searches.
Facilitating alerting. (2)
Facilitating log management (archiving,restoration). (3.. partially)
Accepts only JSON input.
Integrates with a variety of log shippers including logstash, beaver, nxlog, syslog-ng and any shipper that can send JSON to either rabbit-mq or an HTTP(s) endpoint. (4)
Provides easy integration to Cloud-based data sources such as CloudTrail or GuardDuty.
Provides easy python plugins to manipulate your data in transit. (5)
Provides extensive plug-in opportunities to customize your event enrichment stream, your alert workflow, etc. (6)
Provides realtime access to teams of incident responders to allow each other to see their work simultaneously

This basically gives idea that probably 6 primary data management things in Mozdef are very much overlapping with hindsight.

I don't see actually Hindsight as a drop in replacement for Mozdef, but while reading about both the projects, wondering if we can harness the power of one in another. It is one of the rarest log processing engines which support cuckoo filters and parquet output.

I was even wondering if we can combine this with Suricata too.

By the way in my understanding I am trying to compare this with fluent-bit, but with more data manipulation option.

Also wondering, does mozilla uses Kubernetes. I found very less kubernetes resources.

ameihm0912 commented 4 years ago

@rverma-jm this may not answer your original question, but I have experience with both and I think hindsight would actually integrate fairly well with Mozdef.

Mozdef is great for collecting, storing, and viewing log data and doing basic alerting, but where you start to run into trouble with it is streaming analysis.

I could see some sort of architecture with hindsight acting as the logging ingestion layer, forwarding processed log data off to Mozdef's workers for indexing in ES while at the same time doing more advanced streaming data analysis of the input. Likewise, Mozdef's alerting output stream could be connected to hindsight to take advantage of hindsight's various plugins.

Of course there are more details here, but just some thoughts.