ntop / ntopng

Web-based Traffic and Security Network Traffic Monitoring
http://www.ntop.org
GNU General Public License v3.0
6.04k stars 639 forks source link

Flows from ASA shown incorrectly, do not update #1359

Closed qm2k closed 6 years ago

qm2k commented 6 years ago

I configured ASA to send flows to a host running ntopng 3.1.170715-3133 and nprobe 8.1.170715-5819 from apt repository. I used trial versions, but same problem was earlier observed on licensed version 2 from apt-stable.

Then I started the following command: curl --resolve releases.ubuntu.com:80:91.189.88.148 --limit-rate 10k -r 0-3000000 http://releases.ubuntu.com/16.04.2/ubuntu-16.04.2-desktop-amd64.iso > /dev/null and monitored traffic to 91.189.88.148 in ntopng while capturing flows with tcpdump. This curl command creates one long flow with 10 kBps rate.

Then I converted captured .pcap file to .txt in Wireshark and only left flows concerning 91.189.88.148 (since full .pcap may contain sensitive data, I can only provide it to developers by email request). Here's the result: asa_filtered.txt . You can easily see that ASA sends flows every minute with updated number of octets transferred, which very well corresponds to requested 10 kB per second.

At the same time here's what was visible in ntopng interface: screenshot from 2017-07-15 17 22 25 screenshot from 2017-07-15 17 24 22 screenshot from 2017-07-15 17 24 35 screenshot from 2017-07-15 17 25 19

Note that as time goes Duration, Actual Thpt and/or Total Bytes values get displayed incorrectly. Also, 2, 3 etc. minutes duration is never seen as would be expected.

pelhamw commented 6 years ago

I can confirm I'm seeing the same behaviour on nprobe dev build v.8.1.170717-5819

I am now seeing byte counts from long-lived flows being sent to ntop periodically (After the change made by Luca on 13/07/2017, So thanks for that!) That being said, byte counts seem to be displaying incorrectly. Sometimes the active flow page shows no flows, other times it shows throughput far higher than it should be. Total byte count once the flow is torn down seems to be correct.

simonemainardi commented 6 years ago

The first thing we should do is to make sure nProbe timeouts are correctly configured.

In particular:

[--idle-timeout|-d] <timeout>       | It specifies the maximum (seconds) flow
                                    | idle lifetime [default=30]

Implies that every flow which has not been receiving traffic (i.e., a flow update in your instance) during the latest 30 seconds is considered idle and exported exported.

As your ASA sends flow updates every 60 seconds, I expect nProbe to export a new flow every time a flow update is received.

The sequence of events should be this:

For this reason, if you want to merge multiple flow updates into a single flow, then you have to increase --idle-timeout to be greater than one minute (the ASA flow update frequency).

Please, note that there's also another timeout that comes into play:

[--lifetime-timeout|-t] <timeout>   | It specifies the maximum (seconds) flow
                                    | lifetime [default=120]

lifetime-timeout causes every flow which has started more than 120 seconds ago to be exported. Therefore, if you want long-lived flows to be preserved for more than 120 seconds, you should tune this timeout as well.

In conclusion, may I encourage you to use nProbe to export flows to plain text files (-D t .P ./) to do your tests and properly understand the behaviour with long lived flows.

Once you are happy with your timeout settings, then you can connect nProbe to ntopng. Don't forget that also ntopng has a timeout for flow expiration: image

make sure this timeout is less than nProbe --idle-timeout to prevent ntopng from do unnecessary aggregation.

qm2k commented 6 years ago

lifetime-timeout causes every flow which has started more than 120 seconds ago to be exported. Therefore, if you want long-lived flows to be preserved for more than 120 seconds, you should tune this timeout as well.

You mean expired, right? How do I follow very long lived flows, like internet radio streams? Is it ok to set lifetime-timeout to some very long value?

simonemainardi commented 6 years ago

The common tradeoff is to cut a long-lived flow into multiple shorter-lived flows. This will allow you to have (parts of the) flow exported and thus available earlier for visualization.

If you really can't afford cutting the flow into multiple parts, the it is ok to set a very long lifetime-timeout.

Actually, since ASA flow updates send cumulative values for the counters, you should make sure the timeouts won't cut a long lived flow or you will end up in having wrong (too high) values.

Finally, the throughput shown in ntopng is not that meaningful when receiving remote flows as it is computed over the last 5 seconds.

qm2k commented 6 years ago

Finally, the throughput shown in ntopng is not that meaningful when receiving remote flows as it is computed over the last 5 seconds.

Yes, I noticed that too. I wonder if there's a way to change it to 1 minute?

qm2k commented 6 years ago

Unfortunately, I don't see long-lived flows at all if I set lifetime-timeout to some large value. Is there a working combination that would support long-lived flows from ASA?

simonemainardi commented 6 years ago

Finally, the throughput shown in ntopng is not that meaningful when receiving remote flows as it is computed over the last 5 seconds.

Yes, I noticed that too. I wonder if there's a way to change it to 1 minute?

Yes, you have a preference for that: image

qm2k commented 6 years ago

simonemainardi, thank you. Do you know the answer on bigger question how to make ntopng correctly show long-lived flows?

simonemainardi commented 6 years ago

Unfortunately, I don't see long-lived flows at all if I set lifetime-timeout to some large value.

Currently, this is the expected behaviour as nProbe won't export the long-lived flow until lifetime-timeout is reached.

Is there a working combination that would support long-lived flows from ASA?

To have early visibility of long-lived flows in ntopng we should cut the long-lived nProbe flow into multiple smaller flows -- actually we already use this approach. The use of fairly small default --lifetime-timeout in nProbe guarantees that ntopng won't wait longer than 2 minutes before seeing (part of) a flow monitored or collected in nProbe.

The issue with ASA is that is sends cumulative counters in flow updates. So we must be very careful in order to avoid cutting any long-lived flow as this will cause multiple cumulative counters to be summed up together on the ntopng side., yielding wrong traffic values.

In conclusion, in order to have early visibility of long-lived ASA flows in ntopng we have to change the implementation do deal with cumulative counters.

Anyway, keep in mind that the software works, here we are only discussing a limitation due to the long-lived flows reported by the ASA with flow updates.

simonemainardi commented 6 years ago

We have made some improvements to better handle ASA NSEL (including long-lived flows) both on the nProbe and on the ntopng side.

Can we invite you guys to try the latest dev packages of nProbe and ntopng?

As a guideline I suggest using the following values for timeouts and intervals:

qm2k commented 6 years ago

Tested it with long flow using the following command: curl --resolve releases.ubuntu.com:80:91.189.88.148 --limit-rate 100k http://releases.ubuntu.com/16.04.3/ubuntu-16.04.3-desktop-amd64.iso > /dev/null . It's better, but still not there:

Typical result: screenshot from 2017-08-03 18 30 05 screenshot from 2017-08-03 18 30 46

Finished flow is shown correctly: screenshot from 2017-08-03 18 37 41

simonemainardi commented 6 years ago

thanks for the feedback; are you able to generate a pcap of ASA NSEL with the stream of flow-updates concerning a long lived flow? With that pcap I can try and reproduce in our lab.

qm2k commented 6 years ago

Here is the pcap: asa_2017-08-04.pcap.gz

The following traffic was created during capture:

marat@CM01:~$ curl --resolve releases.ubuntu.com:80:91.189.88.148 http://releases.ubuntu.com/16.04.3/ubuntu-16.04.3-desktop-amd64.iso > /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1514M  100 1514M    0     0  5272k      0  0:04:54  0:04:54 --:--:-- 5033k

marat@CM01:~$ curl --resolve releases.ubuntu.com:80:91.189.88.148 --limit-rate 10k -r 0-3000000 http://releases.ubuntu.com/16.04.3/ubuntu-16.04.3-desktop-amd64.iso > /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 2929k  100 2929k    0     0  10263      0  0:04:52  0:04:52 --:--:-- 11455

marat@CM01:~$ watch -n 1 curl --resolve releases.ubuntu.com:80:91.189.88.148 -r 0-300000 http://releases.ubuntu.com/16.04.3/ubuntu-16.04.3-desktop-amd64.iso > /dev/null

(Last command was canceled manually after some time.)

ATTENTION ASA OWNERS: there seems to be a bug or limitation that causes changes in access-list used in flow-export not to apply at least until flow-export rule is re-created. It could prevent you from including only the hosts you need in the exported flows, as I discovered while working on this problem.

simonemainardi commented 6 years ago

so, with the first flow (reduced pcap is CiscoASA-netflow-long-lived-flow.pcap.zip)

image

it looks like the information shown is correct.

However, I can't really understand why client to server bytes, i.e., Initiator octets, is only 123 even though there are more than .5M packets.

I can argue empty ACKs are not counted as initiator octets but this is just a speculation.

Anyway, it is important to confirm that 123 bytes reported by ntopng is correct as it matches the initiator octets values that is also reported in the ASA flow-deleted event

image

(to be continued)

simonemainardi commented 6 years ago

this is another flow (capture CiscoASA-netflow-long-lived-flow-2.pcap.zip).

Again ntopng shows values that are perfectly aligned with those shown in the ASA updates:

image

image

simonemainardi commented 6 years ago

I did a thorough investigation of the pcap and wasn't able to find any flow that is not properly shown by ntopng. Values always match those reported by ASA.

So it really doesn't look like an ntopng/nProbe issue.

Do you have an explanation why Octet values reported don't seem reasonable?

qm2k commented 6 years ago

I'm currently using at ntopng v.3.1.170803 and nProbe v.8.1.170803 installed via http://apt.ntop.org/16.04/all/apt-ntop.deb . Is it the latest version I should have tested?

simonemainardi commented 6 years ago

yes

qm2k commented 6 years ago

Studied your screenshots. Looks like you took them after the flow ended. Please check them while your flow is active.

simonemainardi commented 6 years ago

I debugged every single update of CiscoASA-netflow-long-lived-flow.pcap.zip and still can't find issues on the ntopng/nProbe side.

The real point to me is that Initiator and Responder Octets and Packets are not monotonically increasing inside Cisco updates. By definition, these quantities should never decrease (as they are the totals exchanged) -- unless an out-of-sequence delivery occurs from the cisco to the nprobe.

However, if I look at the ASA NSEL pcap mentioned above, I can clearly see that, even if flow updates are delivered every minute (every 61 seconds actually -- see column time delta) Octets and Packets go up and down without any (to me) logical explanation.

image

As ntopng assumes that Responder Octets and Packets never decrease, it won't update flow counters if it receives values that are smaller with reference to the previously received values. So you may have the impression that ntopng doesn't update at all but actually it doesn't update just because counters received are smaller.

Do you have an explanation of why the ASA is behaving like this?

Also look at the Initiator octets and packets in the latest update: I still have to understand why half million of initiator packets total just 123 bytes.


For the sake of completeness, here's the whole sequence of flow updates as seen from within ntopng.

image

image

image

image

image

image

image

image

These are the corresponding 8 ZMQ messages sent from nProbe to ntopng

06/Aug/2017 18:26:39 [export.c:1095] [ZMQ] {"IPV4_SRC_ADDR":"10.101.2.94","IPV4_DST_ADDR":"91.189.88.148","IPV4_NEXT_HOP":"0.0.0.0","INPUT_SNMP":15,"OUTPUT_SNMP":3,"INITIATOR_PKTS":92807, "INITIATOR_OCTETS":123,"FIRST_SWITCHED":1501824820,"LAST_SWITCHED":1502036799,"L4_SRC_PORT":46966,"L4_DST_PORT":80,"TCP_FLAGS":0,"PROTOCOL":6,"SRC_TOS":0,"SRC_AS":0,"DST_AS":41231,"IPV4_SRC_MASK":0,"IPV4_DST_MASK":0,"EXPORTER_IPV4_ADDRESS":"127.0.0.1", "57943":"192.168.2.222"}

06/Aug/2017 18:26:39 [export.c:1095] [ZMQ] {"IPV4_SRC_ADDR":"91.189.88.148","IPV4_DST_ADDR":"10.101.2.94","IPV4_NEXT_HOP":"0.0.0.0","INPUT_SNMP":3,"OUTPUT_SNMP":15,"INITIATOR_PKTS":204907,"INITIATOR_OCTETS":280310040,"FIRST_SWITCHED":1501824820,"LAST_SWITCHED":1502036799,"L4_SRC_PORT":80,"L4_DST_PORT":46966,"TCP_FLAGS":0,"PROTOCOL":6,"SRC_TOS":0,"SRC_AS":41231,"DST_AS":0,"IPV4_SRC_MASK":0,"IPV4_DST_MASK":0,"EXPORTER_IPV4_ADDRESS":"127.0.0.1", "57943":"192.168.2.222"}
0

06/Aug/2017 18:26:40 [export.c:1095] [ZMQ] {"IPV4_SRC_ADDR":"91.189.88.148","IPV4_DST_ADDR":"10.101.2.94","IPV4_NEXT_HOP":"0.0.0.0","INPUT_SNMP":3,"OUTPUT_SNMP":15,"INITIATOR_PKTS":136638,"INITIATOR_OCTETS":186920784,"FIRST_SWITCHED":1501824882,"LAST_SWITCHED":1502036799,"L4_SRC_PORT":80,"L4_DST_PORT":46966,"TCP_FLAGS":0,"PROTOCOL":6,"SRC_TOS":0,"SRC_AS":41231,"DST_AS":0,"IPV4_SRC_MASK":0,"IPV4_DST_MASK":0,"EXPORTER_IPV4_ADDRESS":"127.0.0.1", "57943":"192.168.2.222"}

06/Aug/2017 18:26:40 [export.c:1095] [ZMQ] {"IPV4_SRC_ADDR":"91.189.88.148","IPV4_DST_ADDR":"10.101.2.94","IPV4_NEXT_HOP":"0.0.0.0","INPUT_SNMP":3,"OUTPUT_SNMP":15,"INITIATOR_PKTS":306443,"INITIATOR_OCTETS":419214024,"FIRST_SWITCHED":1501824943,"LAST_SWITCHED":1502036799,"L4_SRC_PORT":80,"L4_DST_PORT":46966,"TCP_FLAGS":0,"PROTOCOL":6,"SRC_TOS":0,"SRC_AS":41231,"DST_AS":0,"IPV4_SRC_MASK":0,"IPV4_DST_MASK":0,"EXPORTER_IPV4_ADDRESS":"127.0.0.1", "57943":"192.168.2.222"}
0

06/Aug/2017 18:26:40 [export.c:1095] [ZMQ] {"IPV4_SRC_ADDR":"91.189.88.148","IPV4_DST_ADDR":"10.101.2.94","IPV4_NEXT_HOP":"0.0.0.0","INPUT_SNMP":3,"OUTPUT_SNMP":15,"INITIATOR_PKTS":288312,"INITIATOR_OCTETS":394410816,"FIRST_SWITCHED":1501825004,"LAST_SWITCHED":1502036799,"L4_SRC_PORT":80,"L4_DST_PORT":46966,"TCP_FLAGS":0,"PROTOCOL":6,"SRC_TOS":0,"SRC_AS":41231,"DST_AS":0,"IPV4_SRC_MASK":0,"IPV4_DST_MASK":0,"EXPORTER_IPV4_ADDRESS":"127.0.0.1", "57943":"192.168.2.222"}

06/Aug/2017 18:26:40 [export.c:1095] [ZMQ] {"IPV4_SRC_ADDR":"91.189.88.148","IPV4_DST_ADDR":"10.101.2.94","IPV4_NEXT_HOP":"0.0.0.0","INPUT_SNMP":3,"OUTPUT_SNMP":15,"INITIATOR_PKTS":224237,"INITIATOR_OCTETS":306754195,"FIRST_SWITCHED":1501825065,"LAST_SWITCHED":1502036799,"L4_SRC_PORT":80,"L4_DST_PORT":46966,"TCP_FLAGS":0,"PROTOCOL":6,"SRC_TOS":0,"SRC_AS":41231,"DST_AS":0,"IPV4_SRC_MASK":0,"IPV4_DST_MASK":0,"EXPORTER_IPV4_ADDRESS":"127.0.0.1", "57943":"192.168.2.222"}

06/Aug/2017 18:26:40 [export.c:1095] [ZMQ] {"IPV4_SRC_ADDR":"10.101.2.94","IPV4_DST_ADDR":"91.189.88.148","IPV4_NEXT_HOP":"0.0.0.0","INPUT_SNMP":15,"OUTPUT_SNMP":3,"INITIATOR_PKTS":525396,"INITIATOR_OCTETS":123,"FIRST_SWITCHED":1501824820,"LAST_SWITCHED":1502036799,"L4_SRC_PORT":46966,"L4_DST_PORT":80,"TCP_FLAGS":0,"PROTOCOL":6,"SRC_TOS":0,"SRC_AS":0,"DST_AS":41231,"IPV4_SRC_MASK":0,"IPV4_DST_MASK":0,"EXPORTER_IPV4_ADDRESS":"127.0.0.1", "57943":"192.168.2.222"}
0

06/Aug/2017 18:26:40 [export.c:1095] [ZMQ] {"IPV4_SRC_ADDR":"91.189.88.148","IPV4_DST_ADDR":"10.101.2.94","IPV4_NEXT_HOP":"0.0.0.0","INPUT_SNMP":3,"OUTPUT_SNMP":15,"INITIATOR_PKTS":1160537,"INITIATOR_OCTETS":1587609859,"FIRST_SWITCHED":1501824820,"LAST_SWITCHED":1502036799,"L4_SRC_PORT":80,"L4_DST_PORT":46966,"TCP_FLAGS":0,"PROTOCOL":6,"SRC_TOS":0,"SRC_AS":41231,"DST_AS":0,"IPV4_SRC_MASK":0,"IPV4_DST_MASK":0,"EXPORTER_IPV4_ADDRESS":"127.0.0.1", "57943":"192.168.2.222"}
qm2k commented 6 years ago

Do you have an explanation of why the ASA is behaving like this?

Clearly all update flows contain figures are for the elapsed period only. If you add all values from update flows you should obtain the value in the end flow, which is for full period.

qm2k commented 6 years ago

By elapsed period above I mean last 1 minute.

simonemainardi commented 6 years ago

that is true but is even more confusing as the same field has a semantically different meaning depending on the message it is in:

Moreover, the same fields always represent totals in other cisco implementations. See https://www.cisco.com/c/en/us/td/docs/routers/access/ISRG2/AVC/api/guide/AVC_Metric_Definition_Guide/avc_app_exported_fields.html

Let me try and find a solution for this.

In the meanwhile, do you have an explanation of the unreasonable number of initiator octets as well (123)?

qm2k commented 6 years ago

Moreover, the same fields always represent totals in other cisco implementations.

ASA is not originally Cisco, it's a bought product. When dealing with it documentation referring to IOS-based products is usually misleading.

In the meanwhile, do you have an explanation of the unreasonable number of initiator octets as well (123)?

I'll do more captures to find out, but working hypothesis is that it doesn't include service traffic at all. It is possible since ASA reassembles all TCP (and UDP) packets, thus probably only counts the L4 payload.

Also, I frankly don't know how much service traffic is even needed these days to receive 1.5GB of data on a fast network. Receive windows up to 1GB are said to be possible.

simonemainardi commented 6 years ago

Here we are.

A new nProbe build is in progress. Please, wait a couple of hours, update it, and try again.

image image image image image image image image

simonemainardi commented 6 years ago

as concerns the issue with initiator octets, I can confirm that the HTTP request to download the ubuntu image is exactly 123 bytes if we only consider the TCP payload. So it turns out that the ASA only counts L4 bytes payload.

image

qm2k commented 6 years ago

It is indeed much better now. Charts on dashboard previously were very spiky, but now they mostly resemble the information I pull via ssh using home-grown tools: screenshot from 2017-08-08 11 08 42 cropped 2 screenshot from 2017-08-08 11 08 45 cropped Congratulations!

However, I noticed one place that is still very spiky: traffic chart for the interface: screenshot from 2017-08-08 11 07 23 Is it possible that it still doesn't account for partial updates?

simonemainardi commented 6 years ago

alright, I am glad to know it's working now.

Interface spped spikes are physiological and due to the quantized nature of flows that are reported periodically.

This becomes more evident with large long-lived flows (e.g,. file transfers). What happens is that during a particular second ntopng can receive several hundreds of MBs worth of data for a single flow -- due to the fact long-lived flows are reported by the ASA every 1 minute and a lot of traffic can be done by an active flow in a minute. Speed chart spikes are a direct consequence of huge traffic increases in very short periods of time.

Note that this is not a limitation of ntopng. This behaviour is intrinsic with the quantized nature of the ASA traffic flows. We could decide to average the speed over the minute but that will introduce other arbitrary approximations.

Please, let me know if it is safe to close this issue.

qm2k commented 6 years ago

I don't understand why you cannot use partial updates in the interface view in the same way they are now used in dashboard, host and flows views. But since I currently don't use the interface view much I'll close this issue. Now that I finally can use ntopng with my ASA I discovered several other issues, but I'll open them separately.

qm2k commented 6 years ago

And, thank you for the fix!

simonemainardi commented 6 years ago

I do use flow updates also in the interface view but as flow updates occur once every minute the spikes still occur for the reasons described above.