Open webhead404 opened 6 years ago
This has come up in a couple of conversations recently. The near-term fix is to use separate mount points for products that consume a lot of data, which fixes the issue with ES going read-only when Kafka fills up.
Long term, we will probably implement disk quotas to help out with this.
when Kafka fills up will it overwrite the old data ?
Kafka defaults to aging off old data after 168 hours, which is 7 days. You can override this to a shorter period by setting kafka_retention
in your config.yml. This value is set in the Kafka role. There are some other manual ways to set this (see the Kafka docs upstream for retention), but in terms of hours is the easiest and the default method used by the upstream project.
thank you , still have some questions to clarify data retention example below suricata --> kafka -->ES Bro --> kafka -->ES
which one is happening 1 or 2
1- data log "A" moves from surciata/bro --> kafak --> ES
2- data log "A" moves from surciata/bro --> kafak and stays there <-- ES read directly from kafka
how the data is stored in the process what will happened to log "A" when it moves from bro/suricata to kafka ,,etc , will it have the same packet duplicated in both Bro and Kafak or only one ?
because we did test it in heavy traffic environment and we had issues with size and we need to understand how data log is managed between components.
Data from bro is written directly to Kafka in json format. Bro also writes to disk in the classic ascii format. If you're running a higher bandwidth sensor, I recommend disabling the ascii logs, as they eat a lot of space and are too large to grep through. We have a script to do this for you, just add the following line to your local.bro:
@load ./rock/frameworks/logging/disable-ascii
Suricata logs are currently shipped to Kafka using filebeat. We're working on a solution that won't write to disk at all, but it's not ready for the community yet. You can save some space by turning off the fast log and unified2 log. We only ingest the eve.json.
Once everything is in Kafka, Logstash consumes it and transform/enriches and ships to Elasticsearch.
If you're using stenographer, it will age off old pcap data by itself when the disk reaches 80% I think. That's customizable too.
It doesn't appear that there is any documentation on how to manage data in RockNSM as far as disk utilization goes. For instance in my lab after about a week, Kibana is unresponsive. Probably because ES quit indexing new data. Is there a form of log rotation can that be accomplished or data retention settings somewhere?