rocknsm / rock

Automated deployment scripts for the RockNSM network hunting distribution.
http://rocknsm.io
Apache License 2.0
446 stars 96 forks source link

Data Management with RockNSM #326

Open webhead404 opened 6 years ago

webhead404 commented 6 years ago

It doesn't appear that there is any documentation on how to manage data in RockNSM as far as disk utilization goes. For instance in my lab after about a week, Kibana is unresponsive. Probably because ES quit indexing new data. Is there a form of log rotation can that be accomplished or data retention settings somewhere?

bndabbs commented 5 years ago

This has come up in a couple of conversations recently. The near-term fix is to use separate mount points for products that consume a lot of data, which fixes the issue with ES going read-only when Kafka fills up.

Long term, we will probably implement disk quotas to help out with this.

yasser48 commented 5 years ago

when Kafka fills up will it overwrite the old data ?

dcode commented 5 years ago

Kafka defaults to aging off old data after 168 hours, which is 7 days. You can override this to a shorter period by setting kafka_retention in your config.yml. This value is set in the Kafka role. There are some other manual ways to set this (see the Kafka docs upstream for retention), but in terms of hours is the easiest and the default method used by the upstream project.

yasser48 commented 5 years ago

thank you , still have some questions to clarify data retention example below suricata --> kafka -->ES Bro --> kafka -->ES

which one is happening 1 or 2 1- data log "A" moves from surciata/bro --> kafak --> ES
2- data log "A" moves from surciata/bro --> kafak and stays there <-- ES read directly from kafka

how the data is stored in the process what will happened to log "A" when it moves from bro/suricata to kafka ,,etc , will it have the same packet duplicated in both Bro and Kafak or only one ?

because we did test it in heavy traffic environment and we had issues with size and we need to understand how data log is managed between components.

dcode commented 5 years ago

Data from bro is written directly to Kafka in json format. Bro also writes to disk in the classic ascii format. If you're running a higher bandwidth sensor, I recommend disabling the ascii logs, as they eat a lot of space and are too large to grep through. We have a script to do this for you, just add the following line to your local.bro:

@load ./rock/frameworks/logging/disable-ascii

Suricata logs are currently shipped to Kafka using filebeat. We're working on a solution that won't write to disk at all, but it's not ready for the community yet. You can save some space by turning off the fast log and unified2 log. We only ingest the eve.json.

Once everything is in Kafka, Logstash consumes it and transform/enriches and ships to Elasticsearch.

If you're using stenographer, it will age off old pcap data by itself when the disk reaches 80% I think. That's customizable too.