robcowart / elastiflow

Network flow analytics (Netflow, sFlow and IPFIX) with the Elastic Stack
Other
2.48k stars 595 forks source link

Strange Behaviour - Large data no info. #191

Closed sarlacpit closed 4 years ago

sarlacpit commented 5 years ago

Hi,

I am trying to get to the bottom of an issue with our netflow implementation of Elastiflow. Basically, we are seeing a lot of data going too and from 0.0.0.0... An example flow received by Elasticsearch:

},
    "flow": {
      "src_hostname": "0.0.0.0",
      "direction": "undetermined",
      "dst_locality": "private",
      "packets": 8647398368571949000,
      "dst_port": 0,
      "dst_addr": "0.0.0.0",
      "server_locality": "private",
      "src_addr": "0.0.0.0",
      "service_locality": "private",
      "client_hostname": "0.0.0.0",
      "client_addr": "0.0.0.0",
      "dst_port_name": "HOPOPT/0",
      "ip_version": "IPv4",
      "input_snmp": "0",
      "geoip": {
        "autonomous_system": "private"
      },
      "geoip_client": {
        "autonomous_system": "private"
      },
      "ip_protocol_name": "HOPOPT",
      "bytes": 354472653607202600,
      "service_name": "HOPOPT/0",
      "geoip_src": {
        "autonomous_system": "private"
      },
      "geoip_dst": {
        "autonomous_system": "private"
      },
      "server_addr": "0.0.0.0",
      "src_port": 0,
      "src_port_name": "HOPOPT/0",
      "output_snmp": "0",
      "service_port": "0",
      "ip_protocol": 0,
      "dst_hostname": "0.0.0.0",
      "src_locality": "private",
      "traffic_locality": "private",
      "server_hostname": "0.0.0.0",
      "tos": "0",
      "client_locality": "private",
      "geoip_server": {
        "autonomous_system": "private"
      }

The Elastiflow is working otherwise normally. We have port 9995 for Netflow v9 Port 9996 for IPFIX We have approximately 100 Cisco routers sending a standard template - I suspect the problem could be with our Cisco Firepower firewalls.(not sure if these are supported yet). The templates are a default with few options to configure them. I am assuming that these are causing the blanks and massive data reporting.

Example template:

        Template (Id = 281, Count = 22)
            Template Id: 281
            Field Count: 22
            Field (1/22): flowId
                Type: flowId (148)
                Length: 4
            Field (2/22): IPV6_SRC_ADDR
                Type: IPV6_SRC_ADDR (27)
                Length: 16
            Field (3/22): L4_SRC_PORT
                Type: L4_SRC_PORT (7)
                Length: 2
            Field (4/22): INPUT_SNMP
                Type: INPUT_SNMP (10)
                Length: 2
            Field (5/22): IPV6_DST_ADDR
                Type: IPV6_DST_ADDR (28)
                Length: 16
            Field (6/22): L4_DST_PORT
                Type: L4_DST_PORT (11)
                Length: 2
            Field (7/22): OUTPUT_SNMP
                Type: OUTPUT_SNMP (14)
                Length: 2
            Field (8/22): PROTOCOL
                Type: PROTOCOL (4)
                Length: 1
            Field (9/22): ICMP_IPv6_TYPE
                Type: ICMP_IPv6_TYPE (178)
                Length: 1
            Field (10/22): ICMP_IPv6_CODE
                Type: ICMP_IPv6_CODE (179)
                Length: 1
            Field (11/22): postNATSourceIPv6Address
                Type: postNATSourceIPv6Address (281)
                Length: 16
            Field (12/22): postNATDestinationIPv4Address
                Type: postNATDestinationIPv4Address (226)
                Length: 4
            Field (13/22): postNAPTSourceTransportPort
                Type: postNAPTSourceTransportPort (227)
                Length: 2
            Field (14/22): postNAPTDestinationTransportPort
                Type: postNAPTDestinationTransportPort (228)
                Length: 2
            Field (15/22): firewallEvent
                Type: firewallEvent (233)
                Length: 1
            Field (16/22): FW_EXT_EVENT
                Type: FW_EXT_EVENT (33002)
                Length: 2
            Field (17/22): observationTimeMilliseconds
                Type: observationTimeMilliseconds (323)
                Length: 8
            Field (18/22): initiatorOctets
                Type: initiatorOctets (231)
                Length: 8
            Field (19/22): responderOctets
                Type: responderOctets (232)
                Length: 8
            Field (20/22): initiatorPackets
                Type: initiatorPackets (298)
                Length: 8
            Field (21/22): responderPackets
                Type: responderPackets (299)
                Length: 8
            Field (22/22): flowStartMilliseconds
                Type: flowStartMilliseconds (152)
                Length: 8

Have you seen similar behavior before? Or do you recommend I send this data to a different port to set up a different index?

Any recommendations gratefully received.

Thanks

robcowart commented 5 years ago

This looks as if the incoming flows are being decoded incorrectly by the Logstash netflow codec. Due to where it sits in the input, the Netflow codec does not maintain templates per device, rather per flowset ID. This can cause an issue if devices are using the same flowset ID, but are sending a different combination/order of fields.

For example... Device A sends a template for flowset ID 257. The codec processes this and uses it to decode ALL flows with flowset ID 257. Device B is also sending flows with flowset ID 257, however it is sending a different combination of fields. The codec will decode Device B's flows with the template from Device A, and the results will be wrong.

You should be able to determine if you have conflicting flowset templates by doing a PCAP and inspecting it in Wireshark.

The only way to work around this would be to use multiple Logstash instances to eliminate conflicting flowset IDs being sent to the same instance.

sarlacpit commented 5 years ago

Thank you for your reply. I think you are right the templates they are conflicting. I'll set up a separate Logstash instance and see how it looks.

sarlacpit commented 5 years ago

The info is coming through correctly on a separate instance of Logstash. reading between the lines, I am thinking, can I ask logstash to open on another port specifically for this data? I do this already for IPFIX & NetFlow.

robcowart commented 5 years ago

You can. Create a second input by simply copy-pasting the current udp input for netflow and changing the port number. Usually the port number is set by an environment variable like this...

port => "${ELASTIFLOW_NETFLOW_IPV4_PORT:2055}"

You will want to just hardcode this in the second input...

port => "9995"
robcowart commented 4 years ago

This issue will be addressed once the following PRs are merged and released for the...

Logstash UDP Input: https://github.com/logstash-plugins/logstash-input-udp/pull/46 Logstash Netflow Codec: https://github.com/logstash-plugins/logstash-codec-netflow/pull/187

robcowart commented 4 years ago

Unfortunately the Elastic team declined to merge UDP input changes (see... logstash-plugins/logstash-input-udp#46). This leaves no other option than to continue to recommend the workaround of multiple instances of the ElastiFlow pipeline.