shundhammer / qdirstat

QDirStat - Qt-based directory statistics (KDirStat without any KDE - from the original KDirStat author)
GNU General Public License v2.0
1.7k stars 122 forks source link

Linux Disk Usage Tools Compared: QDirStat vs. K4DirStat vs. Baobab vs. Filelight vs. ncdu #97

Closed shundhammer closed 5 years ago

shundhammer commented 5 years ago

Screenshots

QDirStat

QDirStat

K4DirStat

K4DirStat

Baobab

Baobab

Filelight

Filelight

ncdu

ncdu

Performance

All the programs had to scan my /work directory on my (normal rotational, non-SSD) disk, both with cleared kernel directory caches and with caches filled from previous program runs.

To clear those caches on Linux, start a root shell and then enter

echo 3 >/proc/sys/vm/drop_caches

In all cases, the test procedure was:

QDirStat and KDirStat both display the elapsed time for directory reading once that is complete. For the others, a stopwatch was used (which makes the timing somewhat less accurate, of course).

Benchmark Results: /work

/work is an ext4 filesystem with 230 GB / 216k items on a Samsung 1 TB 7200 rpm disk.

Update 2019-04-13: Added results from latest QDirStat with performance improvement

Version Run 1 Run 2 Run 3 Run 4 Run 5 Run 6
Cache cold hot hot hot cold cold
qdirstat 07235ec 25.8 1.5 1.5 1.6 25.3 25.2
qdirstat -s 07235ec 24.9 1.5 1.5 1.5 24.7 24.8
qdirstat 1.5 32.3 1.9 2.0 2.0 33.0 33.0
qdirstat -s 1.5 31.4 1.5 1.8 1.8 32.0 31.7
.
k4dirstat 3.1.3 35.2 2.4 2.3 2.4 34.3 34.2
baobab 3.28.0 24.6 3.5 3.5 3.4 24.7 24.1
filelight 4.17 19.8 1.1 1.2 1.1 19.2 19.5
ncdu 1.12 18.6 1.5 1.2 1.3 19.0 19.0
du -hs 8.28 17.9 0.5 0.5 0.5 18.0 18.0

The exact command for du -hs was time du -hs /work, so the timing was more accurate than with a stopwatch.

Don't get all hung up with split seconds at the results for baobab, filelight and ncdu: Operating a manual stopwatch isn't all that accurate with one hand on the keyboard and the other at the stopwatch.

Benchmark Conclusions

du -hs doesn't do much; it doesn't have a user interface. So this can safely be considered the theoretical minimum how fast this can possibly get: Traverse an entire directory tree, open each directory in sequence (using the opendir() / readdir() / closedir() syscalls) and obtain detailed information from the filesystem for each file or directory encountered (using the stat() or lstat() syscalls).

ncdu comes close. Since it uses a text-based (ncurses) user interface, it doesn't have much overhead for GUI stuff. On the other hand, it also can't do very much (but it can delete the selected file).

Filelight is really fast. In particular, re-reading the same directory appears to be faster than with ncdu (but this might be attributed to the inaccuracies of using a manual stopwatch). According to the output of ps, it uses 3 threads (thus 3 CPU cores); however since this is largely I/O bound, it is not obvious how this helps.

Baobab is fast for uncached reads, but surprisingly slow for the cached ones. But it doesn't keep any information about individual files in memory, only the directories with the sums. That's why it only offers to delete entire subdirectories, but not individual files (duh!).

K4DirStat is a little slower than QDirStat 1.5, even though both use the same directory reading code (inherited from KDirStat). This might be because of more display updates and because of re-sorting the diplayed tree all the time during reading. It is significantly slower than the latest QDirStat with the performance improvements, though.

QDirStat becomes a little faster with the -s (--slow-updates) command line option which was designed for remote X connections that have become very slow with Qt5 (due to always using a pixel buffer that has to be transferred over the network connnection instead of X protocol draw primitives like XDrawString()). But the difference is really negligible.

The latest QDirStat from Git master got quite some performance improvements due to using fstatat() instead of lstat() and sorting the directory entries by i-no before that call so the corresponding i-nodes can be read sequentially with minimized disk seek times (which has no effect on SSDs, though).

Features

Feature QDirStat K4DirStat Baobab Filelight ncdu du
Show tree total size + + + + + +
Show subtree size + + + + + +
Show size of individual files + + + + +
Stop at mounted filesystems + + + + +
Exclude rules + + + +
Show treemap + +
Show some other graph + +
Delete a file + + + +
Delete a directory / subtree + + + + +
Open directory in filemanager + + + +
Custom cleanup actions + +
File type view +
File size histogram view +
Package manager support +
Proper Btrfs subvol handling + ? ? ? ? ?

Quirks and Oddities

Size Units

Baobab shows all sizes in 1000-based units, not 1024-based like all the others. That's why the sizes appear to be different (but they really are not).

Unit 1000-based 1024-based
1 KB 1000 B 1024 B
1 MB 1,000,000 B 1,048,576 B
1 GB 1,000,000,000 B 1,073,741,824 B

Versions Used

All running on Xubuntu 18.04.02 LTS with all the latest updates.

Program Version
QDirStat Git master (07235ec6b658, post-1.5)
QDirStat 1.5
K4DirStat 3.1.3-1
Baobab 3.28.0-1
Filelight 4:17.12.3-0
ncdu 1.12-1
du coreutils-8.28

Hardware

shundhammer commented 5 years ago

Feel free to comment to this issue. That is the main reason why I used this issue tracker instead of the GitHub wiki (the other being that the wiki does not support this nice uploading of screenshots that the issue tracker has).