Closed jlixfeld closed 4 years ago
Most likely you have devices that are sending the same template ID (in this case 260) but the content of the flow records is different. The codec is trying to decode the flows from one device with the template from the other.
This is an issue with the codec. https://github.com/logstash-plugins/logstash-codec-netflow/issues/9
I think the subtext in the issues you cite suggest a workaround could be to use a different dport on the exporter. Am I reading into that correctly, or no?
Hi Rob,
So it seems the only workaround is to stand up multiple LS instances. To that end, I've duplicated the elastiflow directories and added them as additional pipelines in logstash.yml per this link:
https://www.elastic.co/guide/en/logstash/6.2/multiple-pipelines.html
I changed the port in each instance of elastiflow${copy_number}/conf.d/10_input_netflow_ipv4.logstash.conf
from port => "${ELASTIFLOW_NETFLOW_IPV4_PORT:2055}"
to simply port => 205x
(The port is unique across each of the 5 config files).
First time I tried to restart, I was getting Java out-of-memory errors, even though I've tripled the amount of memory allocated to the VM from 12GB to 32GB, so then I increased the heaps to
-Xms20g
-Xmx20g
in /etc/logstash/jvm.options
.
Now, logstash seems to be taking an hour to start, and it's still not done.
Last entry in the log was from an hour ago:
root@ls1:/etc/logstash# date
Mon Nov 5 11:33:25 EST 2018
root@ls1:/etc/logstash# tail -1 /var/log/logstash/logstash-plain.log
[2018-11-05T10:29:50,173][INFO ][logstash.runner ] Starting Logstash {"logstash.version"=>"6.4.2"}
root@ls1:/etc/logstash#
Java is running:
root@ls1:/etc/logstash# ps xa|grep java
840 ? SNsl 70:51 /usr/bin/java -Xms20g -Xmx20g -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djruby.compile.invokedynamic=true -Djruby.jit.threshold=0 -XX:+HeapDumpOnOutOfMemoryError -Djava.security.egd=file:/dev/urandom -cp /usr/share/logstash/logstash-core/lib/jars/animal-sniffer-annotations-1.14.jar:/usr/share/logstash/logstash-core/lib/jars/commons-codec-1.11.jar:/usr/share/logstash/logstash-core/lib/jars/commons-compiler-3.0.8.jar:/usr/share/logstash/logstash-core/lib/jars/error_prone_annotations-2.0.18.jar:/usr/share/logstash/logstash-core/lib/jars/google-java-format-1.1.jar:/usr/share/logstash/logstash-core/lib/jars/gradle-license-report-0.7.1.jar:/usr/share/logstash/logstash-core/lib/jars/guava-22.0.jar:/usr/share/logstash/logstash-core/lib/jars/j2objc-annotations-1.1.jar:/usr/share/logstash/logstash-core/lib/jars/jackson-annotations-2.9.5.jar:/usr/share/logstash/logstash-core/lib/jars/jackson-core-2.9.5.jar:/usr/share/logstash/logstash-core/lib/jars/jackson-databind-2.9.5.jar:/usr/share/logstash/logstash-core/lib/jars/jackson-dataformat-cbor-2.9.5.jar:/usr/share/logstash/logstash-core/lib/jars/janino-3.0.8.jar:/usr/share/logstash/logstash-core/lib/jars/jruby-complete-9.1.13.0.jar:/usr/share/logstash/logstash-core/lib/jars/jsr305-1.3.9.jar:/usr/share/logstash/logstash-core/lib/jars/log4j-api-2.9.1.jar:/usr/share/logstash/logstash-core/lib/jars/log4j-core-2.9.1.jar:/usr/share/logstash/logstash-core/lib/jars/log4j-slf4j-impl-2.9.1.jar:/usr/share/logstash/logstash-core/lib/jars/logstash-core.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.core.commands-3.6.0.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.core.contenttype-3.4.100.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.core.expressions-3.4.300.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.core.filesystem-1.3.100.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.core.jobs-3.5.100.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.core.resources-3.7.100.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.core.runtime-3.7.0.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.equinox.app-1.3.100.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.equinox.common-3.6.0.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.equinox.preferences-3.4.1.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.equinox.registry-3.5.101.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.jdt.core-3.10.0.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.osgi-3.7.1.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.text-3.5.101.jar:/usr/share/logstash/logstash-core/lib/jars/slf4j-api-1.7.25.jar org.logstash.Logstash --path.settings /etc/logstash
root@ls1:/etc/logstash#
top - 11:34:15 up 1:23, 1 user, load average: 1.04, 1.01, 1.00
Tasks: 158 total, 1 running, 157 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 0.0 sy, 25.0 ni, 75.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 32951964 total, 22114460 free, 10656812 used, 180692 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 21934460 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
840 logstash 39 19 23.585g 0.010t 19716 S 100.3 31.9 70:26.07 java
I'm going to assume you've had to take similar measures in some of your other deployments for folks who have been running a homogenous network with similar type exports with similar SW versions, etc. Have you seen anything like this before or do you just spin up entirely separate logstash servers? Any pointers?
The way I would handle this is to break the pipeline into two parts. A "collector", which is basically a the UDP input w/codec and a redis output, and a "processor" which uses a redis input and does all of the post-decoder work and sending to Elasticsearch. You can then have multiple instances of the simple "collector" pipeline, each listening on different ports, and a single instance of the "processor". It would look like this...
collector---\
collector-- redis -- processor -- Elasticsearch
collector---/
Redis is really easy to setup in this use-case and actually can help reduce packet loss receiving flow data via UDP (that is another article to write). I have often thought about adding redis as a requirement for ElastiFlow as it bring a lot of benefits.
You could also try the new pipeline input and output instead of using redis. It should also work for your use-case, but I haven't tested it fully yet. Redis on the other hand is a proven technology for this scenario.
Either way, you will need to do a little restitching of the pipelines. It will however be worth it, as you will not have to run multiple instances of the heavy processing logic.
The way I would handle this is to break the pipeline into two parts. A "collector", which is basically a the UDP input w/codec and a redis output, and a "processor" which uses a redis input and does all of the post-decoder work and sending to Elasticsearch. You can then have multiple instances of the simple "collector" pipeline, each listening on different ports, and a single instance of the "processor". It would look like this...
collector---\ collector-- redis -- processor -- Elasticsearch collector---/
Redis is really easy to setup in this use-case and actually can help reduce packet loss receiving flow data via UDP (that is another article to write). I have often thought about adding redis as a requirement for ElastiFlow as it bring a lot of benefits.
You could also try the new pipeline input and output instead of using redis. It should also work for your use-case, but I haven't tested it fully yet. Redis on the other hand is a proven technology for this scenario.
Either way, you will need to do a little restitching of the pipelines. It will however be worth it, as you will not have to run multiple instances of the heavy processing logic.
Hi rob, I am getting the same error, with a single router configured. How else can Redis be configured? By default Redis listens to tcp traffic (on port 6379 by default). My challenge is that the "logstash-codec-netflow" can't be updated any further.
while using the following simple logstash configuration input & output i get an output:
input { udp { port => 2055 codec => netflow type => "netflow" queue_size => 10000 workers => 4 tags => [ "cisco_logs" ] } } output { elasticsearch { hosts => ["localhost:9200"] index => "test-%{+YYYY.MM.dd}" document_type => "netflow_logs" }
stdout { codec => rubydebug }
}
"host" => "XX.XX.XX.XX",
"netflow" => {
"l4_src_port" => 53,
"direction" => 1,
"in_pkts" => 1,
"src_as" => 0,
"flow_seq_num" => 4798,
"in_bytes" => 88,
"protocol" => 17,
"flow_sampler_id" => 0,
"ipv4_src_addr" => "XX.XX.XX.XX",
"dst_as" => 0,
"first_switched" => "2019-01-14T09:11:27.999Z",
"ipv4_next_hop" => "XX.XX.XX.XX",
"flowset_id" => 256,
"l4_dst_port" => 54678,
"version" => 9,
"last_switched" => "2019-01-14T09:11:27.999Z",
"src_mask" => 24,
"output_snmp" => 1,
"src_tos" => 0,
"input_snmp" => 5,
"tcp_flags" => 16,
"dst_mask" => 26,
"ipv4_dst_addr" => "XX.XX.XX.XX"
},
"@timestamp" => 2019-01-14T09:11:42.000Z,
"type" => "netflow",
"tags" => [
[0] "cisco_logs"
],
"@version" => "1"
How do i get my Elastiflow graphs to work
Note: I used the following instruction to setup Elastiflow: (URL: https://pandaways.com/elastiflow-with-mikrotik-and-centos-7/)
If all that you have is one source of flows, and you are seeing this error, then something else is wrong. It is possible that the template the vendor is sending doesn't match the flow records. I would have to look at a PCAP which contains both the template and a flow record in order to investigate.
elastiflow.zip I hope the PCAP is sufficient (find attached)
You have two different devices sending flows. Both send a template with ID 256, but they contain different fields. This will cause issues as one template will be used to decode both sources, making at one of them always wrong. The only work around for this at the moment is to customize ElastiFlow for multiple collectors, with a common processing instance and some for of messaging queue in between.
thanks rob
hi rob, I've re-installed Elastiflow and configured one device. the following error seems to persist(latest pcap file attached):
[2019-01-16T09:57:02,462][WARN ][logstash.codecs.netflow ] Can't (yet) decode flowset id 256 from source id 0, because no template to decode it with has been received. This message will usually go away after 1 minute. [2019-01-16T09:57:13,469][WARN ][logstash.codecs.netflow ] Can't (yet) decode flowset id 256 from source id 0, because no template to decode it with has been received. This message will usually go away after 1 minute. [2019-01-16T09:57:26,944][WARN ][logstash.codecs.netflow ] Can't (yet) decode flowset id 256 from source id 0, because no template to decode it with has been received. This message will usually go away after 1 minute. [2019-01-16T09:57:38,409][WARN ][logstash.codecs.netflow ] Can't (yet) decode flowset id 256 from source id 0, because no template to decode it with has been received. This message will usually go away after 1 minute. [2019-01-16T09:57:49,407][WARN ][logstash.codecs.netflow ] Can't (yet) decode flowset id 256 from source id 0, because no template to decode it with has been received. This message will usually go away after 1 minute. [2019-01-16T09:57:56,394][WARN ][logstash.codecs.netflow ] Can't (yet) decode flowset id 256 from source id 0, because no template to decode it with has been received. This message will usually go away after 1 minute. [2019-01-16T09:58:07,392][WARN ][logstash.codecs.netflow ] Can't (yet) decode flowset id 256 from source id 0, because no template to decode it with has been received. This message will usually go away after 1 minute. [2019-01-16T09:58:19,388][WARN ][logstash.codecs.netflow ] Can't (yet) decode flowset id 256 from source id 0, because no template to decode it with has been received. This message will usually go away after 1 minute. [2019-01-16T09:58:31,389][WARN ][logstash.codecs.netflow ] Can't (yet) decode flowset id 256 from source id 0, because no template to decode it with has been received. This message will usually go away after 1 minute. [2019-01-16T09:58:43,387][WARN ][logstash.codecs.netflow ] Can't (yet) decode flowset id 256 from source id 0, because no template to decode it with has been received. This message will usually go away after 1 minute.
I'm using ElastiFlow v2.1.0
How long has it been running. Some devices are really slow at sending templates. Fortinet in particular can take 15-30 minutes.
Over an hour.
issue was resolved. Thanks
The way I would handle this is to break the pipeline into two parts. A "collector", which is basically a the UDP input w/codec and a redis output, and a "processor" which uses a redis input and does all of the post-decoder work and sending to Elasticsearch. You can then have multiple instances of the simple "collector" pipeline, each listening on different ports, and a single instance of the "processor". It would look like this...
collector---\ collector-- redis -- processor -- Elasticsearch collector---/
Redis is really easy to setup in this use-case and actually can help reduce packet loss receiving flow data via UDP (that is another article to write). I have often thought about adding redis as a requirement for ElastiFlow as it bring a lot of benefits.
You could also try the new pipeline input and output instead of using redis. It should also work for your use-case, but I haven't tested it fully yet. Redis on the other hand is a proven technology for this scenario.
Either way, you will need to do a little restitching of the pipelines. It will however be worth it, as you will not have to run multiple instances of the heavy processing logic.
@robcowart Do you have any guides on how to implement this?
Out of interest, does the elasticsearch elastiflow netflow plugin also have the same problem with templates overlapping?
The root cause is related to where input codecs (which decode the raw flow data) are located. So it is a Logstash-specific issue. Any solution based on Logstash will have the same challenge.
Most likely you have devices that are sending the same template ID (in this case 260) but the content of the flow records is different. The codec is trying to decode the flows from one device with the template from the other.
This is an issue with the codec. logstash-plugins/logstash-codec-netflow#9
@robcowart looks like there is a patch for this: https://github.com/logstash-plugins/logstash-codec-netflow/issues/9#issuecomment-552129326
But with Logstash 7.4.2 being broken with netflow codec: https://github.com/robcowart/elastiflow/issues/427 I had to roll back to 7.3.2 (for some reason 7.4.2 works with collectd over UDP) I'm wondering if I should start looking at something else for processing netflow....
Filebeat also has its problems, especially around packets seemingly not being processes - I'll need to look at this tomorrow...
I'm assuming if I get stuff like this in my Logstash netflow codec logs then my netflow data isn't being processes properly?
[2019-11-11T20:57:59,701][WARN ][logstash.codecs.netflow ] Template length exceeds flowset length, skipping {:template_id=>257, :template_length=>91, :record_length=>52}
This issue will be addressed once the following PRs are merged and released for the...
Logstash UDP Input: https://github.com/logstash-plugins/logstash-input-udp/pull/46 Logstash Netflow Codec: https://github.com/logstash-plugins/logstash-codec-netflow/pull/187
The way I would handle this is to break the pipeline into two parts. A "collector", which is basically a the UDP input w/codec and a redis output, and a "processor" which uses a redis input and does all of the post-decoder work and sending to Elasticsearch. You can then have multiple instances of the simple "collector" pipeline, each listening on different ports, and a single instance of the "processor". It would look like this...
collector---\ collector-- redis -- processor -- Elasticsearch collector---/
Redis is really easy to setup in this use-case and actually can help reduce packet loss receiving flow data via UDP (that is another article to write). I have often thought about adding redis as a requirement for ElastiFlow as it bring a lot of benefits. You could also try the new pipeline input and output instead of using redis. It should also work for your use-case, but I haven't tested it fully yet. Redis on the other hand is a proven technology for this scenario. Either way, you will need to do a little restitching of the pipelines. It will however be worth it, as you will not have to run multiple instances of the heavy processing logic.
@robcowart Do you have any guides on how to implement this?
I am wondering the same.
Unfortunately the Elastic team declined to merge UDP input changes (see... logstash-plugins/logstash-input-udp#46). This leaves no other option than to continue to recommend the workaround of multiple instances of the ElastiFlow pipeline.
Hi there,
I don't think this is an EF issue, but I'd like to poke the crowd to see if anyone has any ideas before I but the LS folks.
I started seeing my Elastiflow graphs jump up across all metrics from, for example, tens of Gbps to hundreds of Tbps. Obviously bad data.
I correlated it with these logs starting to appear:
I've seem posts in various forums about bugs in netflow codecs, but those bugs are from months ago, all of which seem to have been fixed(?). I just verified that I'm running the latest coded:
Does this look familiar to anyone?
Thanks!