ntop / ntopng

Web-based Traffic and Security Network Traffic Monitoring
http://www.ntop.org
GNU General Public License v3.0
6.26k stars 656 forks source link

Malformed traffic from nprobes over ZMQ #505

Closed Queeq closed 8 years ago

Queeq commented 8 years ago

Running latest nightlies on VMWare virtual machine Ubuntu 14.04, kernel 4.4.6.

There are 6 nprobe instances receiving IPFIX from Juniper routers and 1 ntopng instance which connects to nprobes over TCP (everything's within the same VM). It had been working fine on the same version before I restarted it several times today (both nprobes and ntopng).

Nprobes are run with these parameters:

nprobe --collector-port 2055 --zmq tcp://127.0.0.1:5556 --daemon-mode -n none -i none --aggregation 0/1/1/1/0/0 --sample-rate @4096:1

Ntopng verbose mode shows many messages like these:

19/Apr/2016 16:37:23 [CollectorInterface.cpp:183] [225] �4�
19/Apr/2016 16:37:23 [ParserInterface.cpp:671] Not handled ZMQ field 15/15
19/Apr/2016 16:37:23 [ParserInterface.cpp:671] Not handled ZMQ field 10/10
19/Apr/2016 16:37:23 [ParserInterface.cpp:671] Not handled ZMQ field 14/14
19/Apr/2016 16:37:23 [ParserInterface.cpp:671] Not handled ZMQ field 5/5
19/Apr/2016 16:37:23 [ParserInterface.cpp:671] Not handled ZMQ field 16/16
19/Apr/2016 16:37:23 [ParserInterface.cpp:671] Not handled ZMQ field 17/17
19/Apr/2016 16:37:23 [ParserInterface.cpp:671] Not handled ZMQ field 9/9
19/Apr/2016 16:37:23 [ParserInterface.cpp:671] Not handled ZMQ field 13/13
19/Apr/2016 16:37:23 [ParserInterface.cpp:671] Not handled ZMQ field 42/42
19/Apr/2016 16:37:23 [CollectorInterface.cpp:183] [222] 0�4�
19/Apr/2016 16:37:23 [ParserInterface.cpp:671] Not handled ZMQ field 15/15
19/Apr/2016 16:37:23 [ParserInterface.cpp:671] Not handled ZMQ field 10/10
19/Apr/2016 16:37:23 [ParserInterface.cpp:671] Not handled ZMQ field 14/14
19/Apr/2016 16:37:23 [ParserInterface.cpp:671] Not handled ZMQ field 5/5
19/Apr/2016 16:37:23 [ParserInterface.cpp:671] Not handled ZMQ field 16/16
19/Apr/2016 16:37:23 [ParserInterface.cpp:671] Not handled ZMQ field 17/17
19/Apr/2016 16:37:23 [ParserInterface.cpp:671] Not handled ZMQ field 9/9
19/Apr/2016 16:37:23 [ParserInterface.cpp:671] Not handled ZMQ field 13/13
19/Apr/2016 16:37:23 [ParserInterface.cpp:671] Not handled ZMQ field 42/42
19/Apr/2016 16:37:23 [CollectorInterface.cpp:183] [223] P�*�
19/Apr/2016 16:37:23 [ParserInterface.cpp:671] Not handled ZMQ field 15/15
19/Apr/2016 16:37:23 [ParserInterface.cpp:671] Not handled ZMQ field 10/10
19/Apr/2016 16:37:23 [ParserInterface.cpp:671] Not handled ZMQ field 14/14
19/Apr/2016 16:37:23 [ParserInterface.cpp:671] Not handled ZMQ field 5/5
19/Apr/2016 16:37:23 [ParserInterface.cpp:671] Not handled ZMQ field 16/16
19/Apr/2016 16:37:23 [ParserInterface.cpp:671] Not handled ZMQ field 17/17
19/Apr/2016 16:37:23 [ParserInterface.cpp:671] Not handled ZMQ field 9/9
19/Apr/2016 16:37:23 [ParserInterface.cpp:671] Not handled ZMQ field 13/13
19/Apr/2016 16:37:23 [ParserInterface.cpp:671] Not handled ZMQ field 42/42
19/Apr/2016 16:37:23 [CollectorInterface.cpp:183] [223]  .5�
19/Apr/2016 16:37:23 [ParserInterface.cpp:671] Not handled ZMQ field 15/15
19/Apr/2016 16:37:23 [ParserInterface.cpp:671] Not handled ZMQ field 10/10
19/Apr/2016 16:37:23 [ParserInterface.cpp:671] Not handled ZMQ field 14/14
19/Apr/2016 16:37:23 [ParserInterface.cpp:671] Not handled ZMQ field 5/5
19/Apr/2016 16:37:23 [ParserInterface.cpp:671] Not handled ZMQ field 16/16
19/Apr/2016 16:37:23 [ParserInterface.cpp:671] Not handled ZMQ field 17/17
19/Apr/2016 16:37:23 [ParserInterface.cpp:671] Not handled ZMQ field 9/9
19/Apr/2016 16:37:23 [ParserInterface.cpp:671] Not handled ZMQ field 13/13
19/Apr/2016 16:37:23 [ParserInterface.cpp:671] Not handled ZMQ field 42/42
19/Apr/2016 16:37:23 [CollectorInterface.cpp:183] [222] �h5�
19/Apr/2016 16:37:23 [ParserInterface.cpp:671] Not handled ZMQ field 15/15
19/Apr/2016 16:37:23 [ParserInterface.cpp:671] Not handled ZMQ field 10/10
19/Apr/2016 16:37:23 [ParserInterface.cpp:671] Not handled ZMQ field 14/14
19/Apr/2016 16:37:23 [ParserInterface.cpp:671] Not handled ZMQ field 5/5
19/Apr/2016 16:37:23 [ParserInterface.cpp:671] Not handled ZMQ field 16/16
19/Apr/2016 16:37:23 [ParserInterface.cpp:671] Not handled ZMQ field 17/17
19/Apr/2016 16:37:23 [ParserInterface.cpp:671] Not handled ZMQ field 9/9
19/Apr/2016 16:37:23 [ParserInterface.cpp:671] Not handled ZMQ field 13/13
19/Apr/2016 16:37:23 [ParserInterface.cpp:671] Not handled ZMQ field 42/42
19/Apr/2016 16:37:23 [CollectorInterface.cpp:183] [223] �5�

Sniffing traffic on loopback interface gives this:

➜ tcpdump -ni lo -c 10 port 5558 -vvv
tcpdump: listening on lo, link-type EN10MB (Ethernet), capture size 65535 bytes
16:40:58.327107 IP (tos 0x0, ttl 64, id 6300, offset 0, flags [DF], proto TCP (6), length 213)
    127.0.0.1.5558 > 127.0.0.1.44092: Flags [P.], cksum 0xfec9 (incorrect -> 0x6d32), seq 3570775324:3570775485, ack 863861716, win 342, options [nop,nop,TS val 236039850 ecr 236039600], length 161
16:40:58.327116 IP (tos 0x0, ttl 64, id 53387, offset 0, flags [DF], proto TCP (6), length 52)
    127.0.0.1.44092 > 127.0.0.1.5558: Flags [.], cksum 0xfe28 (incorrect -> 0x01fd), seq 1, ack 161, win 367, options [nop,nop,TS val 236039850 ecr 236039850], length 0
16:40:58.900747 IP (tos 0x0, ttl 64, id 6301, offset 0, flags [DF], proto TCP (6), length 8244)
    127.0.0.1.5558 > 127.0.0.1.44092: Flags [P.], cksum 0x1e29 (incorrect -> 0xdf11), seq 161:8353, ack 1, win 342, options [nop,nop,TS val 236039993 ecr 236039850], length 8192
16:40:58.900779 IP (tos 0x0, ttl 64, id 53388, offset 0, flags [DF], proto TCP (6), length 52)
    127.0.0.1.44092 > 127.0.0.1.5558: Flags [.], cksum 0xfe28 (incorrect -> 0xdcdf), seq 1, ack 8353, win 1390, options [nop,nop,TS val 236039993 ecr 236039993], length 0
16:40:58.900812 IP (tos 0x0, ttl 64, id 6302, offset 0, flags [DF], proto TCP (6), length 8244)
    127.0.0.1.5558 > 127.0.0.1.44092: Flags [P.], cksum 0x1e29 (incorrect -> 0x92a4), seq 8353:16545, ack 1, win 342, options [nop,nop,TS val 236039993 ecr 236039993], length 8192
16:40:58.900817 IP (tos 0x0, ttl 64, id 53389, offset 0, flags [DF], proto TCP (6), length 52)
    127.0.0.1.44092 > 127.0.0.1.5558: Flags [.], cksum 0xfe28 (incorrect -> 0xb8e0), seq 1, ack 16545, win 2413, options [nop,nop,TS val 236039993 ecr 236039993], length 0
16:40:58.900839 IP (tos 0x0, ttl 64, id 6303, offset 0, flags [DF], proto TCP (6), length 8244)
    127.0.0.1.5558 > 127.0.0.1.44092: Flags [P.], cksum 0x1e29 (incorrect -> 0xa0a3), seq 16545:24737, ack 1, win 342, options [nop,nop,TS val 236039993 ecr 236039993], length 8192
16:40:58.900846 IP (tos 0x0, ttl 64, id 53390, offset 0, flags [DF], proto TCP (6), length 52)
    127.0.0.1.44092 > 127.0.0.1.5558: Flags [.], cksum 0xfe28 (incorrect -> 0x94e1), seq 1, ack 24737, win 3436, options [nop,nop,TS val 236039993 ecr 236039993], length 0
16:40:58.900867 IP (tos 0x0, ttl 64, id 6304, offset 0, flags [DF], proto TCP (6), length 8244)
    127.0.0.1.5558 > 127.0.0.1.44092: Flags [P.], cksum 0x1e29 (incorrect -> 0xe6f5), seq 24737:32929, ack 1, win 342, options [nop,nop,TS val 236039993 ecr 236039993], length 8192
16:40:58.900872 IP (tos 0x0, ttl 64, id 53391, offset 0, flags [DF], proto TCP (6), length 52)
    127.0.0.1.44092 > 127.0.0.1.5558: Flags [.], cksum 0xfe28 (incorrect -> 0x741e), seq 1, ack 32929, win 3631, options [nop,nop,TS val 236039993 ecr 236039993], length 0
10 packets captured

Note packet length of 8192 from nprobe. During normal operation I noticed it was sending packets 150-200 bytes long.

lucaderi commented 8 years ago

We have added compression and (optional) encryption some time ago. Can you please confirm the versions of ntopng and nprobe you are using?

Queeq commented 8 years ago

nprobe 7.3.160419-5042 ntopng 2.3.160419-1164

lucaderi commented 8 years ago

Please send a pcap file (full packet size) with flows sent by your routers to nProbe as I need to debug the issue. Note that I need both flows and templates.

Queeq commented 8 years ago

We use default ipv4-template by Juniper. It is described here. I'll send sample flows over e-mail.

Queeq commented 8 years ago

I tested it yesterday and it worked. I think we may close this issue now. I wonder if latest commits resolved #507 too.

Queeq commented 8 years ago

Seems like I was too fast closing it. It had been working for some time but stopped working after the first restart. I'm running nprobe 7.3.160422-5045 and ntopng 2.3.160422-1178. Symptoms are the same.

lucaderi commented 8 years ago

Without a pcap I can't help much

Queeq commented 8 years ago

@lucaderi Please check your e-mail, I sent it to the e-mail in your Github profile on the 19th of April, 15:10 UTC. The subject was "ntopng issue #505 flows pcap".

lucaderi commented 8 years ago

@Queeq Look like the pcap you sent me contains flows but not templates. Please try again. screen shot 2016-04-24 at 23 36 03

lucaderi commented 8 years ago

Closing for inactivity. Will reopen if necessary.

Queeq commented 8 years ago

@lucaderi Sorry, I was on vacation during the last days. I've just (minutes ago) managed to include templates into the Netflow packets from our routers and will shortly send the traffic dump by e-mail to you.

For now, just for reference for other people exporting IPFIX from Juniper routers, the following is the configuration that is necessary to include templates together with flows:

show configuration services                    
flow-monitoring {
    version-ipfix {
        template NAME {
            flow-active-timeout 30;
            flow-inactive-timeout 60;
            template-refresh-rate {
                packets 1;
                seconds 10;
            }
            ipv4-template;
        }
    }
}

Note that you need both packets and seconds options enabled within template-refresh-rate for it to work.

Queeq commented 8 years ago

@lucaderi Have you received the pcap file on April 28th? Please reopen if yes.