Closed danieldjewell closed 7 years ago
Good timing, as just yesterday I committed a first step towards allowing different hash algorithms to be selected.
I've played with this in the past a bit and it ranged from no difference (with murmurhash2) to slower (with xxhash) so I don't particularly expect improvement to be found in allowing different hash choices. Most of the hashing is done in a separate thread and it's not the bottleneck, even on a very slow CPU (atom).
Nonetheless, I'm planning on making it selectable just to make experimenting easier.
I've added support for sha1 and sha512 (in addition to current md5). Some timings:
% repeat 5 time dupd scan -p $HOME -q -F md5 dupd scan -p $HOME -q -F md5 4.02s user 6.02s system 143% cpu 7.000 total dupd scan -p $HOME -q -F md5 3.75s user 5.86s system 138% cpu 6.935 total dupd scan -p $HOME -q -F md5 3.60s user 6.04s system 138% cpu 6.948 total dupd scan -p $HOME -q -F md5 3.73s user 5.91s system 138% cpu 6.962 total dupd scan -p $HOME -q -F md5 3.66s user 6.00s system 138% cpu 6.965 total
% repeat 5 time dupd scan -p $HOME -q -F sha1 dupd scan -p $HOME -q -F sha1 4.67s user 5.80s system 140% cpu 7.433 total dupd scan -p $HOME -q -F sha1 4.60s user 5.88s system 141% cpu 7.429 total dupd scan -p $HOME -q -F sha1 4.56s user 5.96s system 140% cpu 7.473 total dupd scan -p $HOME -q -F sha1 5.24s user 6.00s system 147% cpu 7.615 total dupd scan -p $HOME -q -F sha1 4.51s user 6.02s system 141% cpu 7.444 total
% repeat 5 time dupd scan -p $HOME -q -F sha512 dupd scan -p $HOME -q -F sha512 6.20s user 5.93s system 140% cpu 8.653 total dupd scan -p $HOME -q -F sha512 6.17s user 5.96s system 140% cpu 8.616 total dupd scan -p $HOME -q -F sha512 6.08s user 6.12s system 140% cpu 8.662 total dupd scan -p $HOME -q -F sha512 6.06s user 6.10s system 140% cpu 8.627 total dupd scan -p $HOME -q -F sha512 6.06s user 6.05s system 140% cpu 8.626 total
After playing with the software a bit and also reviewing some of the code, it appears that MD5 is being used for comparing files (for those other than where a direct comparison is utilized).
Although MD5 is considered pretty compromised in the security world - it's probably OK here.
The interesting part that I found was that SHA1 was actually faster in single-threaded performance according to "openssl speed" (see below).
System tested was Ubuntu 16.10 running (4 cores):