Some Juniper IPFIX flows are not parsed correctly

cvicente commented 4 years ago

Logstash version: 6.5.1
Codec Version: 4.2.1
Operating System: Ubuntu 14.04
Exporter device: Juniper MX480 running JUNOS 17.3R3-S3.3
Config File (if you have sensitive info, please remove it):

The configs are rather long. These are relevant sections:

input {
    udp {
        host => "192.0.2.1"
        port => 9995
    workers => 1
        codec => netflow {
            versions => [10]
            target => "ipfix"
        }
        type => ipfix
        add_field => {
            "site" => "xyz"
    }
    }
}

    # Multiply by sampling interval to calculate the total bytes and packets the flow represents
    ruby {
    code => "event.set('[ipfix][octetDeltaCount]', event.get('[ipfix][octetDeltaCount]') * 1000)"
    }
    ruby {
    code => "event.set('[ipfix][packetDeltaCount]', event.get('[ipfix][packetDeltaCount]') * 1000)"
    }

    # add a bits field
    ruby {
    code => "event.set('[ipfix][bits]', event.get('[ipfix][octetDeltaCount]') * 8)"
    }

    # add fields to correctly calculate packets/sec, bits/sec
    ruby {
    code => "event.set('[ipfix][bits_1_60]', event.get('[ipfix][bits]') / 60)"
    }
    ruby {
    code => "event.set('[ipfix][packets_1_60]', event.get('[ipfix][packetDeltaCount]') / 60)"
    }

Sample Data:

About 0.4% of flows saved to a file appear incomplete and look like the following:

{"host":"192.0.2.2","@version":"1","tags":["IPv6","_rubyexception"],"hostname":"router.example.net","site":"xyz","token":"xxx","ipfix":{"flowIdleTimeout":60,"systemInitTimeMilliseconds":1552313129000,"exporterIPv4Address":"192.0.2.2","version":10,"exporterIPv6Address":"::","samplingInterval":1000,"flowActiveTimeout":60,"exportProtocolVersion":10,"exportedMessageTotalCount":3111510,"exportingProcessId":2,"exportTransportProtocol":17,"exportedFlowRecordTotalCount":4830582},"type":"ipfix","@timestamp":"2019-07-11T18:25:03.000Z"}

s1sfa commented 4 years ago

I've just ignored these. This looks like the periodic update that the router will send with stats about the netflow setup, like samplingInterval:1000. It would be interesting if the plugin could somehow use this to map sampling interval so we wouldn't need to add config like: code => "event.set('[ipfix][octetDeltaCount]', event.get('[ipfix][octetDeltaCount]') * 1000)"

cvicente commented 4 years ago

I should add that I noticed this while troubleshooting my installs. I am observing a significant mismatch between the packets per second calculated using netflow data compared to what's reported by SNMP. I see no evidence of UDP errors, so all the input messages seem to be processed.

Interestingly, I have older collectors using logstash 2.3.4 and logstash-codec-netflow-2.1.1 where there is no pps mismatch.

colinsurprenant commented 4 years ago

In your above example, the event is tagged with _rubyexception which means that one of the ruby code section produced an exception. You probably want to verify, using conditionals in your filter section, that an event received is valid before applying ruby code to it.

colinsurprenant commented 4 years ago

It is quite hard to tell why you have this mismatch between your SNMP metrics and this netflow data. You could try to enable Persistent Queue to prevent possible back pressure which could result in dropped UDP packets. Monitoring the PQ will tell you if there are indication of back pressure situations.

logstash-plugins / logstash-codec-netflow

Some Juniper IPFIX flows are not parsed correctly #180