Closed hans-helmut closed 2 years ago
This is quite likely caused by the fact the internal sequence of commands is generated in a single thread, but the commands are processed in parallel. So generating the stream of commands became a bottleneck and deduplication threads are simply fighting for work, actively spinning (this is how rayon works).
Generating commands was made single-threaded due to other feature request about making the stream of commands match the order of the input files. I need to find a different way.
Two ideas here:
This seems to be the matching rayon issue.
Hello,
while fclones was one of the rare duplicate-finders that managed to find all duplicates in a reasonable amount of time (below 2 days) in my backup, hardlinking is very slow.
top -H -c
shows 3 threads with high CPU-usage:Connecting
strace -p <PID>
to the threads shows that one thread is callingfor different versions of a file, while all other threads are calling
in a loop. So there seems to be some active waiting, while deleting files
Environment
cp -al
and then rsynced.dupes.txt
is about 4 GB