Logstash not ingesting netflow events

e-dennington commented 4 years ago

I am having an issue with my Elastiflow install where logstash doesn't ingest any of the netflow data it is receiving and thus doesn't pass it on to Elasticsearch. In the log, logstash seems to start fine and the elastiflow pipeline is successfully started but I never see anything in the log after that. I do see netflow traffic on the configured port (2055) if I do a tcpdump on the logstash server. I have also verified that the firewall and selinux are disabled. I have run this setup before where the entire stack was on a single VM which worked fine on Ubuntu, but this time I have seperated logstash on its own VM with Elasticsearch and Kibana running on another and both are now running RHEL 7.7. I feel there must be something config wise I'm missing but I've been staring at it for a while so I'm sure it's right in front of me but i'm not seeing it.

Here are the relevant bits of logstash log showing it starting up and connecting to Elasticsearch:

[2020-01-09T14:50:51,484][INFO ][logstash.runner ] Starting Logstash {"logstash.version"=>"7.5.1"} [2020-01-09T14:51:03,735][INFO ][org.reflections.Reflections] Reflections took 36 ms to scan 1 urls, producing 20 keys and 40 values [2020-01-09T14:52:29,348][INFO ][logstash.outputs.elasticsearch][elastiflow] Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://elastic:xxxxxx@10.12.104.162:9200/]}} [2020-01-09T14:52:29,593][WARN ][logstash.outputs.elasticsearch][elastiflow] Restored connection to ES instance {:url=>"http://elastic:xxxxxx@10.12.104.162:9200/"} [2020-01-09T14:52:29,642][INFO ][logstash.outputs.elasticsearch][elastiflow] ES Output version determined {:es_version=>7} [2020-01-09T14:52:29,647][WARN ][logstash.outputs.elasticsearch][elastiflow] Detected a 6.x and above cluster: the type event field won't be used to determine the document _type {:es_version=>7} [2020-01-09T14:52:29,700][INFO ][logstash.outputs.elasticsearch][elastiflow] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["//10.12.104.162:9200"]} [2020-01-09T14:52:29,770][INFO ][logstash.outputs.elasticsearch][elastiflow] Using mapping template from {:path=>"/etc/logstash/elastiflow/templates/elastiflow.template.json"} [2020-01-09T14:52:29,902][INFO ][logstash.outputs.elasticsearch][elastiflow] Attempting to install template {:manage_template=>{"order"=>0, "version"=>30502, "index_patterns"=>"elastiflow-3.5.3-*", "settings"=>{"index"=>{"number_of_shards"=>3, "number_of_replicas"=>1, "refresh_interval"=>"10s", "codec"=>"best_compression",.... ..... ..... [2020-01-09T14:59:20,274][INFO ][logstash.outputs.elasticsearch][elastiflow] Installing elasticsearch template to _template/elastiflow-3.5.3 [2020-01-09T14:59:20,391][INFO ][logstash.filters.geoip ][elastiflow] Using geoip database {:path=>"/etc/logstash/elastiflow/geoipdbs/GeoLite2-ASN.mmdb"} [2020-01-09T14:59:26,643][INFO ][logstash.filters.geoip ][elastiflow] Using geoip database {:path=>"/etc/logstash/elastiflow/geoipdbs/GeoLite2-City.mmdb"} [2020-01-09T14:59:26,649][INFO ][logstash.filters.geoip ][elastiflow] Using geoip database {:path=>"/etc/logstash/elastiflow/geoipdbs/GeoLite2-City.mmdb"} [2020-01-09T14:59:26,789][INFO ][logstash.filters.geoip ][elastiflow] Using geoip database {:path=>"/etc/logstash/elastiflow/geoipdbs/GeoLite2-ASN.mmdb"} [2020-01-09T14:59:33,086][WARN ][org.logstash.instrument.metrics.gauge.LazyDelegatingGauge][elastiflow] A gauge metric of an unknown type (org.jruby.specialized.RubyArrayOneObject) has been create for key: cluster_uuids. This may result in invalid serialization. It is recommended to log an issue to the responsible developer/development team. [2020-01-09T14:59:33,093][INFO ][logstash.javapipeline ][elastiflow] Starting pipeline {:pipeline_id=>"elastiflow", "pipeline.workers"=>8, "pipeline.batch.size"=>512, "pipeline.batch.delay"=>250, "pipeline.max_inflight"=>4096, "pipeline.sources"=>["/etc/logstash/elastiflow/conf.d/10_input_ipfix_ipv4.logstash.conf", "/etc/logstash/elastiflow/conf.d/10_input_netflow_ipv4.logstash.conf", "/etc/logstash/elastiflow/conf.d/10_input_sflow_ipv4.logstash.conf", "/etc/logstash/elastiflow/conf.d/20_filter_10_begin.logstash.conf", "/etc/logstash/elastiflow/conf.d/20_filter_20_netflow.logstash.conf", "/etc/logstash/elastiflow/conf.d/20_filter_30_ipfix.logstash.conf", "/etc/logstash/elastiflow/conf.d/20_filter_40_sflow.logstash.conf", "/etc/logstash/elastiflow/conf.d/20_filter_90_post_process.logstash.conf", "/etc/logstash/elastiflow/conf.d/30_output_10_single.logstash.conf"], :thread=>"#"} [2020-01-09T14:59:33,348][INFO ][logstash.javapipeline ][elastiflow] Pipeline started {"pipeline.id"=>"elastiflow"} [2020-01-09T14:59:33,369][INFO ][logstash.inputs.tcp ][elastiflow] Starting tcp input listener {:address=>"10.12.104.161:4739", :ssl_enable=>"false"} [2020-01-09T14:59:33,433][INFO ][logstash.inputs.udp ][elastiflow] Starting UDP listener {:address=>"10.12.104.161:2055"} [2020-01-09T14:59:33,468][INFO ][logstash.inputs.udp ][elastiflow] Starting UDP listener {:address=>"10.12.104.161:6343"} [2020-01-09T14:59:33,499][INFO ][logstash.inputs.udp ][elastiflow] Starting UDP listener {:address=>"10.12.104.161:4739"} [2020-01-09T14:59:33,600][INFO ][logstash.inputs.udp ][elastiflow] UDP listener started {:address=>"10.12.104.161:6343", :receive_buffer_bytes=>"33554432", :queue_size=>"4096"} [2020-01-09T14:59:33,600][INFO ][logstash.inputs.udp ][elastiflow] UDP listener started {:address=>"10.12.104.161:2055", :receive_buffer_bytes=>"33554432", :queue_size=>"16384"} [2020-01-09T14:59:33,606][INFO ][logstash.inputs.udp ][elastiflow] UDP listener started {:address=>"10.12.104.161:4739", :receive_buffer_bytes=>"33554432", :queue_size=>"4096"} [2020-01-09T14:59:33,642][INFO ][logstash.agent ] Pipelines running {:count=>1, :running_pipelines=>[:elastiflow], :non_running_pipelines=>[]} [2020-01-09T14:59:33,888][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}

Thanks in advance for any help or ideas.

e-dennington commented 4 years ago

So I played around with this more and logstash doesn't appear to like netflow data from my Cisco cat 6500 switches for some reason. It happily accepts it from nexus 7k although it took FOREVER to get it to see the template being sent from the 7k so it could start decoding. Same with an ASA I tested with, although it never seemed to import the template, it at least reported in the log that it was seeing netflow data it just couldn't decode it (i let it go for about an hour).

robcowart commented 4 years ago

If you send me a PCAP of flows from the switches, I can take a look and try to figure out the issue.

Regarding the long wait for templates, I wonder if they might be missed due to dropped packets. This can happen if Logstash can’t process them fast enough. By default when Logstash is started using systemd, it is started with Nice=19. If you edit /etc/systemd/system/logstash.service and change this to Nice=0, you should get a good boost in throughput and reduce any dropped packets.

robcowart commented 4 years ago

BTW, you can email PCAPs to elastiflow@gmail.com if you don't want to post them here.

e-dennington commented 4 years ago

Thanks. I'll work on getting a pcap to you today.

e-dennington commented 4 years ago

PCAP sent. I did test the catalyst 6500 with sending v5 netflow instead and that looked to work fine. I saw netflow logs showing up in Elasticsearch and when I changed it back to v9 they stopped again.

robcowart commented 4 years ago

Extracting out an example of each I was able to replay and decode these flows fine. The problem is not the content of the flows.

In just the small window of time in this PCAP you are getting 579 "packets" per second, and you have an average of approx. 30 flow records per packet, for a total of 17370 flows per second from just this ONE device. You also mention other devices, so I can only assume that you have a significant number of flows, which will require a substantial Elastic Stack implementation.

If you run netstat -su you should see a line related to buffer related drops. On my Ubuntu machine that line looks like 1235 receive buffer errors. Run that command a few times and determine the rate of packet drops. I suspect that it will be significant, which would suggest that your system is so busy that occassional template is lost among the drops. Netflow v5 works because it is a static record structure, but you are probably only getting a small subset of data due to the drops.

e-dennington commented 4 years ago

Yeah, I did the same calc you did once I took the capture and realized I need many more logstash instances. I'll probably have to look at load balancing on a per packet basis since this traffic is form a single source.

robcowart / elastiflow

Logstash not ingesting netflow events #477