Closed the8472 closed 10 months ago
I think this is caused by this code since realpath uses readlink internally:
I don't quite understand why the sanitizing is necessary on every file. Instead only canonicalizing the starting points should do the job. Once that's an absolute, symlink-free path adding the descendants will remain so since directory traversal isn't following symlinks either.
I think this is caused by this code since realpath uses readlink internally:
I don't quite understand why the sanitizing is necessary on every file. Instead only canonicalizing the starting points should do the job. Once that's an absolute, symlink-free path adding the descendants will remain so since directory traversal isn't following symlinks either.
You are correct: call from walk_dir()
should not use realpath(3)
I am currently reworking the whole scan phase to fix many of its issues:
0.27 [jack:~/git/duperemove] git diff --stat origin/master | tail -1
21 files changed, 1103 insertions(+), 1285 deletions(-)
I will work on this issue after that WIP is merged
Cool, will the refactoring also reduce the memory usage (previous versions didn't use as much) or should I file a separate issue for that?
Cool, will the refactoring also reduce the memory usage (previous versions didn't use as much) or should I file a separate issue for that?
Yes it does Last time I checked, on some extreme cases (for instance, the "large number of identical small files"), I had up to 90% memory reduction for the scan phase
Hello @the8472 I believe your issue has been fixed in the latest release
Feel free to reopen if you still face the issue
Thank you for your contribution !
The "Gathering file list..." phase seems to take a lot of time and peeking at it with
strace
indicates that it's doing a ton ofreadlink
syscalls for each file. I'm not sure what the purpose is but that can probably be optimized?duperemove 0.13