pavel-odintsov / fastnetmon

FastNetMon - very fast DDoS sensor with sFlow/Netflow/IPFIX/SPAN support
https://fastnetmon.com
GNU General Public License v2.0
3.39k stars 561 forks source link

sFlow capture engine skipping samples #812

Closed zpc-ar closed 2 years ago

zpc-ar commented 4 years ago

I am running fastnetmon with netmap but am planning to migrate to sflow due to convenience. However some attacks are not detected at all due to the sFlow capture engine seemingly skipping samples or entire frames. An attack that is reported with 4796 mbps and 451761 pps via netmap peaks at roughly 500 mbps and 55000 pps when using the sflow capture engine. Pmacct/sfacct sees 3832 mbps and 343517 pps for the same time frame. The machine supplying the sFlow data is an Arista 7280R.

I can provide a 1 min pcap replay and a dump.txt from the netmap backend.

pavel-odintsov commented 4 years ago

Hello!

sFlow depends on proper sampling rate configuration. Only with proper sampling rate you may receive reliable bandwidth data: https://blog.sflow.com/2009/06/sampling-rates.html

Netmap is being deprecated and I can recommend AF_PACKET as another option.

zpc-ar commented 4 years ago

Hey, I realize that the sampling rate may not be optimal but the discrepancy between actual data rate (or what netmap calculates) and the sFlow data from the switch is not that much of an issue. My problem is that the same sFlow data returns vastly different results when run through FastNetMon (where it peaks at only 500 mbps and 55000 pps) and pmacct/sfacct where I get an average of 3832 mbps and 343517 pps (which is much closer to the real data rate). The particular attack had lots of fragmented UDP packets, I suspect that FastNetMon does somehow not correctly parse these in its sFlow backend.

pavel-odintsov commented 4 years ago

Hello!

What is your sampling rate? Do you have any errors from parser in /var/log/fastnetmon.log?

Can you share pcap file collected for 3-5 minutes with me, please? Just message it to: pavel.odintsov@gmail.com

zpc-ar commented 4 years ago

Hey, the sampling rate is set to 32768. I realize that this will not produce super accurate results but it is good enough for my purpose. There were no errors logged in /var/log/fastnetmon.log. I have sent you an e-mail with further information.

pavel-odintsov commented 4 years ago

Replied via email.

zpc-ar commented 4 years ago

The attack that provoked this issue consisted of a large amount of packets that had a fragmentation offset > 0. In "sflow_collector.cpp" line 925 such packets are however not further processed:

if (sample->ip_fragmentOffset > 0) {
    // printf("IPFragmentOffset %u\n", sample->ip_fragmentOffset);
} else {
    /* advance the pointer to the next protocol layer */
    /* ip headerLen is expressed as a number of quads */
    ptr += (ip.version_and_headerLen & 0x0f) * 4;
    decodeIPLayer4(sample, ptr);
 }

For sFlow, the creation of "simple_packet" for FastNetMon's "process_packet" only happens in the function "decodeIPLayer4" but it is never called for fragmented packets.

Dirty patch to address this:

--- fastnetmon-1.1.4.orig/src/sflow_plugin/sflow_collector.cpp
+++ fastnetmon-1.1.4/src/sflow_plugin/sflow_collector.cpp
@@ -927,6 +927,43 @@ void decode_ipv4_protocol(SFSample* samp
     sample->ip_fragmentOffset = ntohs(ip.frag_off) & 0x1FFF;
     if (sample->ip_fragmentOffset > 0) {
         // printf("IPFragmentOffset %u\n", sample->ip_fragmentOffset);
+        simple_packet current_packet;
+
+        if (sample->gotIPV6) {
+            current_packet.ip_protocol_version = 6;
+            memcpy(current_packet.src_ipv6.s6_addr, sample->ipsrc.address.ip_v6.addr, 16);
+            memcpy(current_packet.dst_ipv6.s6_addr, sample->ipdst.address.ip_v6.addr, 16);
+        } else {
+            current_packet.ip_protocol_version = 4;
+            current_packet.src_ip = sample->ipsrc.address.ip_v4.addr;
+            current_packet.dst_ip = sample->ipdst.address.ip_v4.addr;
+        }
+
+        current_packet.flags = 0;
+        current_packet.ip_fragmented = 1;
+        current_packet.number_of_packets = 1;
+        current_packet.length = sample->sampledPacketSize;
+        current_packet.sample_ratio = sample->meanSkipCount;
+
+        switch (sample->dcd_ipProtocol) {
+        case 1: {
+            current_packet.protocol = IPPROTO_ICMP;
+        } break;
+        case 6: {
+            current_packet.protocol = IPPROTO_TCP;
+        } break;
+        case 17: {
+            current_packet.protocol = IPPROTO_UDP;
+        } break;
+        }
+
+        current_packet.flags = 0;
+        current_packet.ip_fragmented = 1;
+        current_packet.number_of_packets = 1;
+        current_packet.length = sample->sampledPacketSize;
+        current_packet.sample_ratio = sample->meanSkipCount;
+
+        sflow_process_func_ptr(current_packet);
     } else {
         /* advance the pointer to the next protocol layer */
         /* ip headerLen is expressed as a number of quads */
pavel-odintsov commented 4 years ago

Thank you so much for such detailed report! Yes, it's definitely serious issue. Your patch is perfectly fine and I think it will work. But we need little bit more time to implement such check properly.

Would you mind sharing pcap with such traffic with me personally? We will need some test to confirm that logic works as expected.

zpc-ar commented 4 years ago

Mail sent.

pavel-odintsov commented 2 years ago

We're planning to fix it in sFlow plugin rewrite which is scheduled for coming months.

pavel-odintsov commented 2 years ago

Addressed in https://github.com/pavel-odintsov/fastnetmon/commit/22d480d5b65f7e2a700dd74c072cc6fb369b839f and https://github.com/pavel-odintsov/fastnetmon/commit/34c648aadc5ee70fd70b0ca3c55a1d8fc5872823