Closed SanderDelden closed 6 months ago
Hi
Well - performance is not always that easy .. There is a yes and a no.
First of all, I had always a focus on performance for nfdump. Over the years - nfdump celebrates its twentieth anniversary this year - the software grows and gets more complicated as more and more features get implemented.
However, it all depends - on the number of CPUs, the disk I/O subsystem, SSD/HD and the job to be done.
There was a rather big change from 1.6.x to 1.7.x, due to the ever growing demands. At some point, a few internal structures come to a limit and need to be rewritten. The file backend was replaced by a more flexible one, which allows more flexible data to store, specifically for variable length data, which more and more exporters send. Network-Based Application Recognition (NBAR), interface names, user names, melting pcap data with netflow, therefore having payload data to decode dns, ssl, ja3 and ja4 hashes etc ... the features where drastically improved over the last few years.
So lets focus on performance and threading. Nfdump 1.6.x has a more basic data structure, and is single threaded only. Nfdump 1.7.x jumped to multi-threading but got a more complex data structure. As of now, nfdump 1.7.x gains from multiple cores, when reading and writing data, while processing records in a single thread. There are plans (hopefully to come true for 2024) to further gain from more cores, by having multiple data processing threads. This will be a next boost for speed. However all threading is useless if the I/O system can not cope with the load of the data processing threads.
Nfdump 1.7.0 - 1.7.4 are releases which successively removed old code while keeping compatibility with old files. The currect master branch is another step forward from 1.7.4 by implementing a new filter engine and by removing old bulky data structures. Furthermore, all nfdump 1.7.x releases have a time penalty for converting old 1.6.x files while reading and processing them.
Below I have some examples, which show the difference of different versions. There are different use cases and I took yours as one of them.
All my test have been done on a single flow file:
% ls -al
-rw-r--r--@ 1 peter staff 4496296845 Mar 10 14:29 flows.nf
-rw-r--r--@ 1 peter staff 4101204914 Mar 10 14:27 flows1.nf
The file flows1.nf is of version 1 for nfdump-1.6.x, flows.nf is a version 2 file for nfdump 1.7.x but identical flow data. Both files are lz4 compressed. The version 2 files are ~5-10% larger in size, depending on the flow records. Wether you are using a single file or a series of file -R .. -M ..
does not make a difference. The total size is > 4GB.
Versions:
% nfdump -v flows1.nf
File : flows1.nf
Version : 1 - lz4 compressed
Blocks : 13484
Checking data blocks
Total
Type 2 blocks : 13484
Records : 196367493
% nfdump -v flows.nf
File : flows.nf
Version : 2 - lz4 compressed
Created : 2024-03-10 14:27:00
Created by : nfdump
nfdump : f1070200
encryption : no
Appdx blks : 1
Data blks : 9739
Checking data blocks
Checking appendix blocks
Total
Type 3 blocks : 9740
Records : 196367494
and the statistics:
% nfdump -r flows.nf -I
Ident: none
Flows: 196367492
Flows_tcp: 196367261
Flows_udp: 216
Flows_icmp: 15
Flows_other: 0
Packets: 715186246
Packets_tcp: 715185820
Packets_udp: 411
Packets_icmp: 15
Packets_other: 0
Bytes: 618685994298
Bytes_tcp: 618685973038
Bytes_udp: 20870
Bytes_icmp: 390
Bytes_other: 0
First: 1574347152
Last: 1577796527
msec_first: 111
msec_last: 833
Sequence failures: 0
So the file roughly comparable to your data.
Fo all tests I use nfdump-1.6.x (latest 1.6.x branch) nfdump-1.7.4-release and nfdump-1.7.4-b1bf5a1 in the master branch.
I copied your test. ( btw. -w /dev/null does not work, after the filter!)
% time -h nfdump-1.6.x -r flows1.nf -w /dev/null 'ip a.b.c.d or ip u.v.w.x'
4.85s real 4.41s user 0.29s sys
% time -h nfdump-1.7.4 -r flows1.nf -w /dev/null 'ip a.b.c.d or ip u.v.w.x'
6.51s real 8.70s user 1.08s sys
% time -h nfdump-master -r flows1.nf -w /dev/null 'ip a.b.c.d or ip u.v.w.x'
7.76s real 9.88s user 1.10s sys
This basically confirms your findings - however, as explained above, there is a penalty in converting file versions. Therefore the same tests with native version 2 files:
% time -h nfdump-1.7.4 -r flows.nf -w /dev/null 'ip a.b.c.d or ip u.v.w.x'
5.33s real 8.54s user 1.20s sys
% time -h nfdump-master -r flows.nf -w /dev/null 'ip a.b.c.d or ip u.v.w.x'
4.77s real 7.97s user 1.22s sys
The 1.7.4 release is still a bit slower, than good old 1.6.x, however given that the 196Mio flows and > 4GB data has been processed, it's not so bad. The master branch is equal to 1.6.x, even a bit faster with native v2 files. If you have a large flow base on v1 files and you regularly process them - yes - you would benefit from converting them to the v2 format. As the current master removed limiting 1.6.x structures, it become even slower. (Maybe I could improve that a bit??)
Btw. you can convert v1 files to v2 file by simple read/write the files with a 1.7.x binary.
For example:
% nfdump -r <v1-file> -w <v2-file> -z=lz4
% mv <v2-file <v1-file>
In all other tests, I use the V2 file for nfdump.1.7.x with the identical data.
This test uses a filter with 10 elements ip a or ip b or ip ...
and writes the result into a file with lz4 compression. The 10 IPs are the top 10 talkers (result from -s ip
):
% time -h nfdump-1.6.x -r flows1.nf -w f1.nf -y -f filter3.txt
25.47s real 24.22s user 0.70s sys
% time -h nfdump-1.7.4 -r flows.nf -w f2.nf -z=lz4 -f filter3.txt
22.83s real 38.51s user 2.85s sys
% time -h nfdump-master -r flows.nf -w f3.nf -z=lz4 -f filter3.txt
19.46s real 34.94s user 2.92s sys
Now things already look different. Nfdump-1.6.x is slowest, while the branch is fastest.
To test the filter, I use a complex filter, which I received from a user. It has > 900 filter elements and the filter .txt file is ~ 16k:
-rw-r--r--@ 1 peter staff 16034 Mar 10 12:52 filter2.txt
% time -h nfdump-1.6.x -r flows1.nf -f filter2.txt > /dev/null
7m42.40s real 7m32.14s user 0.69s sys
% time -h nfdump-1.7.4 -r flows.nf -f filter2.txt > /dev/null
7m35.74s real 7m37.33s user 2.20s sys
% time -h nfdump-master -r flows.nf -f filter2.txt > /dev/null
4m57.53s real 5m0.66s user 2.09s sys
Ok - complex filtering with a 16k filter on a 4GB data file is not an every day usage case. Here the master branch is clearly the winner.
This is an I/O and CPU test and simply changes the compression method from lz4 to bz2 of the data:
% time -h nfdump-1.6.x -r flows1.nf -w f1.nf -J 2
File flows1.nf compression changed
7m1.20s real 6m51.42s user 2.84s sys
% time -h nfdump-1.7.4 -r flows.nf -J2
File flows.nf compression changed
3m22.06s real 13m19.02s user 9.48s sys
% time -h nfdump-master -r flows.nf -J2
File flows.nf compression changed
3m19.78s real 13m9.66s user 9.71s sys
The 1.7.x version is more than double as fast as 1.6.x
Depending on the hardware these tests may differ on other systems. All tests above were done on a M2 Macbook, which has really fast CPUs and I/O.
Sorry, if this answer got a bit longer than usual, but things need to be clear. Bottom line - yes - some runs may be slower, as explained above, others are equally fast or much faster. It depends on the setup and usage. As I already tried to keep preformance high with the 1.6.x version, I compete with 1.7.x to my own software. If you are feeling lucky with nfdump-1.6.x and the features, feel free to use it but upgrade to the latest 1.6.x branch. I always work with the latest master :) On average there is no performance penalty with 1.7.x , sometimes some is visible, on specific scenareos such as yours. The current master branch mostly outperformes 1.6.x on a daily usage and future improvements will speed up the processing even more.
Furthermore, if you have ideas for improvement feel free to contact me.
Hope, this helps.
Thank you very much for the in depth reply, much appriciated!
I'll definitely look into converting the old files to the new format.
Hi,
I've done some comparison tests between
nfdump
1.7.4 (latest master 9c1021d) and 1.6.17 and noticed that 1.6.17 is faster. I actually expected the opposite duenfdump
being multi-threaded since 1.7.0.Here are the outputs of some of the tests I've run (the first run is always
nfdump
1.7.4):All above tests were run on sFlow data collected by
sfcapd
1.6.17 asnfdump
1.6.17 is not compatible with data collected bysfcapd
1.7.4. When running the same test on sFlow data collected bysfcapd
1.7.4 not much improvement is observed (if any):Old dataset (
sfcapd
1.6.17):New dataset (
sfcapd
1.7.4):