idaholab / Malcolm

Malcolm is a powerful, easily deployable network traffic analysis tool suite for full packet capture artifacts (PCAP files), Zeek logs and Suricata alerts.
https://idaholab.github.io/Malcolm/
Other
353 stars 58 forks source link

determine difference in storage space based on enabling/disabling features #510

Open mmguero opened 3 months ago

mmguero commented 3 months ago

Certain features cause different fields to be indexed whether they're enabled or disabled. These might include:

What I'd like to know is how storage space is affected based on enabling/disabling these things are. So basically set up Malcolm, ingest a bunch of PCAP and then do a du on the opensearch directory with and without these features enabled, and report on the differences.

In addition to measuring disk usage, I'd also like to profile the logstash pipelines with these features enabled disabled. So after ingesting the pcap, take this measurement:

$ docker compose exec logstash curl -XGET http://localhost:9600/_node/stats/pipelines | jq -r '.. | .filters? // empty | .[] | objects | select (.events.in > 0) | [.id, .events.in, .events.out, .events.duration_in_millis] | join (";")' | sort -n -t ';' -k4

which will give you a breakdown of how much time was spent in each Logstash Filter.

The one I'm most particularly interested in is the NetBox one, with NetBox enabled and auto-populate turned on, test LOGSTASH_NETBOX_ENRICHMENT_DATASETS with the defaults and then test it again with all. I'd like to know how much more time is added and how much more disk space is used if we enrich ALL log types from NetBox.