phaag / nfdump

Netflow processing tools
Other
776 stars 203 forks source link

Decreased nfdump performance after upgrading from 1.6.17 #512

Closed SanderDelden closed 6 months ago

SanderDelden commented 7 months ago

Hi,

I've done some comparison tests between nfdump 1.7.4 (latest master 9c1021d) and 1.6.17 and noticed that 1.6.17 is faster. I actually expected the opposite due nfdump being multi-threaded since 1.7.0.

Here are the outputs of some of the tests I've run (the first run is always nfdump 1.7.4):

svd@flow-collector01:/tmp/nfdump-1.7.4/bin$ time ./nfdump -M /flows/router1:router2:router3:router4:router5:router6 -R nfcapd.202401220600:nfcapd.202401221000 "ip x.x.x.x or ip y.y.y.y" -w /dev/null

real    0m43.785s
user    1m5.166s
sys 0m3.931s
svd@flow-collector01:~$ time nfdump -M /flows/router1:router2:router3:router4:router5:router6 -R nfcapd.202401220600:nfcapd.202401221000 "ip x.x.x.x or ip y.y.y.y" -w /dev/null

real    0m34.772s
user    0m31.894s
sys 0m2.876s
svd@flow-collector01:/tmp/nfdump-1.7.4/bin$ time ./nfdump -M /flows/router1:router2:router3:router4:router5:router6 -R nfcapd.202401220600:nfcapd.202401221000 -w /dev/null

real    0m44.081s
user    1m5.980s
sys 0m4.183s
svd@flow-collector01:~$ time nfdump -M /flows/router1:router2:router3:router4:router5:router6 -R nfcapd.202401220600:nfcapd.202401221000 -w /dev/null

real    0m35.374s
user    0m33.765s
sys 0m1.608s

All above tests were run on sFlow data collected by sfcapd 1.6.17 as nfdump 1.6.17 is not compatible with data collected by sfcapd 1.7.4. When running the same test on sFlow data collected by sfcapd 1.7.4 not much improvement is observed (if any):

svd@flow-collector01:/tmp/nfdump-1.7.4/bin$ time ./nfdump -M /flows/router1:router2:router3:router4:router5:router6 -R nfcapd.202403060600:nfcapd.202403061000 -w /dev/null

real    0m55.732s
user    1m17.255s
sys 0m9.048s

Old dataset (sfcapd 1.6.17):

svd@flow-collector01:/tmp/nfdump-1.7.4/bin$ ./nfdump -M /flows/router1:router2:router3:router4:router5:router6 -R nfcapd.202401220600:nfcapd.202401221000 -I
Ident: router1
Flows: 192844868
Flows_tcp: 131923510
Flows_udp: 58796650
Flows_icmp: 118663
Flows_other: 2006045
Packets: 1928448680000
Packets_tcp: 1319235100000
Packets_udp: 587966500000
Packets_icmp: 1186630000
Packets_other: 20060450000
Bytes: 1871894845150000
Bytes_tcp: 1311245309480000
Bytes_udp: 542121861070000
Bytes_icmp: 544437370000
Bytes_other: 17983237230000
First: 1705899600
Last: 1705914059
msec_first: 23
msec_last: 997
Sequence failures: 0

New dataset (sfcapd 1.7.4):

svd@flow-collector01:/tmp/nfdump-1.7.4/bin$ ./nfdump -M /flows/router1:router2:router3:router4:router5:router6 -R nfcapd.202403060600:nfcapd.202403061000 -I
Ident: router1
Flows: 241245242
Flows_tcp: 155806290
Flows_udp: 82497530
Flows_icmp: 147872
Flows_other: 2793550
Packets: 2412452420000
Packets_tcp: 1558062900000
Packets_udp: 824975300000
Packets_icmp: 1478720000
Packets_other: 27935500000
Bytes: 2287000989730000
Bytes_tcp: 1501135138460000
Bytes_udp: 761364665050000
Bytes_icmp: 599253580000
Bytes_other: 23901932640000
First: 1709701200
Last: 1709715659
msec_first: 27
msec_last: 999
Sequence failures: 0
phaag commented 6 months ago

Hi

Well - performance is not always that easy .. There is a yes and a no.

First of all, I had always a focus on performance for nfdump. Over the years - nfdump celebrates its twentieth anniversary this year - the software grows and gets more complicated as more and more features get implemented.

However, it all depends - on the number of CPUs, the disk I/O subsystem, SSD/HD and the job to be done.

There was a rather big change from 1.6.x to 1.7.x, due to the ever growing demands. At some point, a few internal structures come to a limit and need to be rewritten. The file backend was replaced by a more flexible one, which allows more flexible data to store, specifically for variable length data, which more and more exporters send. Network-Based Application Recognition (NBAR), interface names, user names, melting pcap data with netflow, therefore having payload data to decode dns, ssl, ja3 and ja4 hashes etc ... the features where drastically improved over the last few years.

So lets focus on performance and threading. Nfdump 1.6.x has a more basic data structure, and is single threaded only. Nfdump 1.7.x jumped to multi-threading but got a more complex data structure. As of now, nfdump 1.7.x gains from multiple cores, when reading and writing data, while processing records in a single thread. There are plans (hopefully to come true for 2024) to further gain from more cores, by having multiple data processing threads. This will be a next boost for speed. However all threading is useless if the I/O system can not cope with the load of the data processing threads.

Nfdump 1.7.0 - 1.7.4 are releases which successively removed old code while keeping compatibility with old files. The currect master branch is another step forward from 1.7.4 by implementing a new filter engine and by removing old bulky data structures. Furthermore, all nfdump 1.7.x releases have a time penalty for converting old 1.6.x files while reading and processing them.

Below I have some examples, which show the difference of different versions. There are different use cases and I took yours as one of them.

All my test have been done on a single flow file:

% ls -al
-rw-r--r--@ 1 peter  staff  4496296845 Mar 10 14:29 flows.nf
-rw-r--r--@ 1 peter  staff  4101204914 Mar 10 14:27 flows1.nf

The file flows1.nf is of version 1 for nfdump-1.6.x, flows.nf is a version 2 file for nfdump 1.7.x but identical flow data. Both files are lz4 compressed. The version 2 files are ~5-10% larger in size, depending on the flow records. Wether you are using a single file or a series of file -R .. -M .. does not make a difference. The total size is > 4GB.

Versions:

% nfdump -v flows1.nf
File       : flows1.nf
Version    : 1 - lz4 compressed
Blocks     : 13484
Checking data blocks

Total
Type 2 blocks : 13484
Records       : 196367493

% nfdump -v flows.nf
File       : flows.nf
Version    : 2 - lz4 compressed
Created    : 2024-03-10 14:27:00
Created by : nfdump
nfdump     : f1070200
encryption : no
Appdx blks : 1
Data blks  : 9739
Checking data blocks
Checking appendix blocks

Total
Type 3 blocks : 9740
Records       : 196367494

and the statistics:

% nfdump -r flows.nf -I
Ident: none
Flows: 196367492
Flows_tcp: 196367261
Flows_udp: 216
Flows_icmp: 15
Flows_other: 0
Packets: 715186246
Packets_tcp: 715185820
Packets_udp: 411
Packets_icmp: 15
Packets_other: 0
Bytes: 618685994298
Bytes_tcp: 618685973038
Bytes_udp: 20870
Bytes_icmp: 390
Bytes_other: 0
First: 1574347152
Last: 1577796527
msec_first: 111
msec_last: 833
Sequence failures: 0

So the file roughly comparable to your data.

Fo all tests I use nfdump-1.6.x (latest 1.6.x branch) nfdump-1.7.4-release and nfdump-1.7.4-b1bf5a1 in the master branch.

Test 1:

I copied your test. ( btw. -w /dev/null does not work, after the filter!)

% time -h nfdump-1.6.x -r flows1.nf -w /dev/null 'ip a.b.c.d or ip u.v.w.x'
    4.85s real      4.41s user      0.29s sys

% time -h nfdump-1.7.4 -r flows1.nf -w /dev/null 'ip a.b.c.d or ip u.v.w.x'
    6.51s real      8.70s user      1.08s sys

% time -h nfdump-master -r flows1.nf -w /dev/null 'ip a.b.c.d or ip u.v.w.x'
    7.76s real       9.88s user     1.10s sys

This basically confirms your findings - however, as explained above, there is a penalty in converting file versions. Therefore the same tests with native version 2 files:

% time -h nfdump-1.7.4 -r flows.nf -w /dev/null 'ip a.b.c.d or ip u.v.w.x'
    5.33s real        8.54s user      1.20s sys
% time -h nfdump-master -r flows.nf -w /dev/null 'ip a.b.c.d or ip u.v.w.x'
    4.77s real      7.97s user      1.22s sys

The 1.7.4 release is still a bit slower, than good old 1.6.x, however given that the 196Mio flows and > 4GB data has been processed, it's not so bad. The master branch is equal to 1.6.x, even a bit faster with native v2 files. If you have a large flow base on v1 files and you regularly process them - yes - you would benefit from converting them to the v2 format. As the current master removed limiting 1.6.x structures, it become even slower. (Maybe I could improve that a bit??)

Btw. you can convert v1 files to v2 file by simple read/write the files with a 1.7.x binary.

For example:

% nfdump -r <v1-file> -w <v2-file> -z=lz4
% mv <v2-file <v1-file>

In all other tests, I use the V2 file for nfdump.1.7.x with the identical data.

Test 2:

This test uses a filter with 10 elements ip a or ip b or ip ... and writes the result into a file with lz4 compression. The 10 IPs are the top 10 talkers (result from -s ip):

% time -h nfdump-1.6.x -r flows1.nf -w f1.nf -y -f filter3.txt
    25.47s real     24.22s user     0.70s sys
% time -h nfdump-1.7.4 -r flows.nf -w f2.nf -z=lz4 -f filter3.txt
    22.83s real     38.51s user     2.85s sys
% time -h nfdump-master -r flows.nf -w f3.nf -z=lz4 -f filter3.txt
    19.46s real     34.94s user     2.92s sys

Now things already look different. Nfdump-1.6.x is slowest, while the branch is fastest.

Test 3

To test the filter, I use a complex filter, which I received from a user. It has > 900 filter elements and the filter .txt file is ~ 16k:

-rw-r--r--@ 1 peter  staff  16034 Mar 10 12:52 filter2.txt
% time -h nfdump-1.6.x -r flows1.nf -f filter2.txt > /dev/null
    7m42.40s real       7m32.14s user       0.69s sys
% time -h nfdump-1.7.4 -r flows.nf -f filter2.txt > /dev/null
    7m35.74s real       7m37.33s user       2.20s sys
% time -h nfdump-master -r flows.nf -f filter2.txt > /dev/null
    4m57.53s real       5m0.66s user        2.09s sys

Ok - complex filtering with a 16k filter on a 4GB data file is not an every day usage case. Here the master branch is clearly the winner.

Test 4

This is an I/O and CPU test and simply changes the compression method from lz4 to bz2 of the data:

% time -h nfdump-1.6.x -r flows1.nf -w f1.nf -J 2
File flows1.nf compression changed
    7m1.20s real    6m51.42s user     2.84s sys
% time -h nfdump-1.7.4 -r flows.nf -J2
File flows.nf compression changed
    3m22.06s real       13m19.02s user      9.48s sys
% time -h nfdump-master -r flows.nf -J2
File flows.nf compression changed
    3m19.78s real   13m9.66s user     9.71s sys

The 1.7.x version is more than double as fast as 1.6.x

Depending on the hardware these tests may differ on other systems. All tests above were done on a M2 Macbook, which has really fast CPUs and I/O.

Finally

Sorry, if this answer got a bit longer than usual, but things need to be clear. Bottom line - yes - some runs may be slower, as explained above, others are equally fast or much faster. It depends on the setup and usage. As I already tried to keep preformance high with the 1.6.x version, I compete with 1.7.x to my own software. If you are feeling lucky with nfdump-1.6.x and the features, feel free to use it but upgrade to the latest 1.6.x branch. I always work with the latest master :) On average there is no performance penalty with 1.7.x , sometimes some is visible, on specific scenareos such as yours. The current master branch mostly outperformes 1.6.x on a daily usage and future improvements will speed up the processing even more.

Furthermore, if you have ideas for improvement feel free to contact me.

Hope, this helps.

SanderDelden commented 6 months ago

Thank you very much for the in depth reply, much appriciated!

I'll definitely look into converting the old files to the new format.