Hi @sahib
could you share why your superfast tool reports differently than other tools?
All performers are on GitHub, downloadable.
This scriplet (attached) shows differences between 'rmlint' and 'DIFFTREE' on latest Linux kernel tree.
Bottomline: First one gives 26+391=417 duplicates, whereas my script gives 434, who knows what causes the discrepancy?! My email: sanmayce@sanmayce.com
First, it is good to run more such tools, the-more-the-merrier,
since the tool below scans only files 1 bytes or bigger long while there are 26 (see further below) files with 0 bytes size - which means 25 duplicates,
in the end reported 409+25=434 duplicates, thus DIFTREE is kinda closer to the right count.
[root@djudjeto2 tree_bench]# echo 3 > /proc/sys/vm/drop_caches
[root@djudjeto2 tree_bench]# ./linux_czkawka_cli dup -m 1 -d TreeUnderDeduplication/
Results of searching ["/home/sanmayce/WorkTemp/tree_bench/TreeUnderDeduplication"] with excluded directories [] and excluded items []
-------------------------------------------------Files with same hashes-------------------------------------------------
Found 409 duplicated files which in 274 groups which takes 2.06 MiB.
Testdataset: linux-6.6.1 tree (untarred archive to TreeUnderDeduplication/)
OS: Fedora release 38 (Thirty Eight) x86_64
Host: 20LRS04700 ThinkPad 11e 5th Gen
Kernel: 6.2.12-300.fc38.x86_64
CPU: Intel Celeron N4100 (4) @ 2.400GHz
SSD: nvme Transcend 1TB bufferless
Filesystem: ext4
Hi @sahib could you share why your superfast tool reports differently than other tools? All performers are on GitHub, downloadable.
This scriplet (attached) shows differences between 'rmlint' and 'DIFFTREE' on latest Linux kernel tree. Bottomline: First one gives 26+391=417 duplicates, whereas my script gives 434, who knows what causes the discrepancy?! My email: sanmayce@sanmayce.com
First, it is good to run more such tools, the-more-the-merrier, since the tool below scans only files 1 bytes or bigger long while there are 26 (see further below) files with 0 bytes size - which means 25 duplicates, in the end reported 409+25=434 duplicates, thus DIFTREE is kinda closer to the right count.
Testdataset: linux-6.6.1 tree (untarred archive to TreeUnderDeduplication/) OS: Fedora release 38 (Thirty Eight) x86_64 Host: 20LRS04700 ThinkPad 11e 5th Gen Kernel: 6.2.12-300.fc38.x86_64 CPU: Intel Celeron N4100 (4) @ 2.400GHz SSD: nvme Transcend 1TB bufferless Filesystem: ext4
The actual scriplet in use:
The full script 'SpeedShowdown.sh' is attached. SpeedShowdown.sh.tar.gz