trapexit / mergerfs-tools

Optional tools to help manage data in a mergerfs pool
ISC License
372 stars 42 forks source link

mergerfs.dedup - doesn't validate if duplicate is the same file (all copies might be deleted if symlinks are involved) #124

Open fedy-cz opened 2 years ago

fedy-cz commented 2 years ago

I have a mergerfs filesystem where some branches contain symlinks to a directory at the same path within a different branch. Example: /mnt/branch1/dir - symlink to /mnt/branch2/dir /mnt/branch2/dir - a directory

The reasons for doing such a thing are historical (there are existing processes that work with certain branch paths directly, the data had to be moved to a different volume, ...). Maybe I'm using mergerfs horribly wrong, but didn't find any warning about this in the docs and so far it seems to work fine.

The issue: It seems like when the mergerfs.dedup is called on the merged filesystem it incorrectly identifies the files within both paths as different files (different copies) and would attempt to delete them. That would result in 0 copies of the affected files.

Suggested solution: There should be an option (or even better it might be the default), where:

Basically the idea is that a tool made to delete duplicates should make especially sure that everything it deletes are truly duplicates.

trapexit commented 2 years ago

As the docs say if you're overlapped files are not in sync you will have some issues. It's on the user to take on the responsibility for such things.

These tools are just random things I purpose built for other people as template for their own tooling. If you have a non-standard setup where overlapping files are in fact not the same or even same type then it will have to be modified for that setup. This really isn't much different from situations where people "move" files from a mount into a bind mount of the same.

fedy-cz commented 2 years ago

Implemented the proposed additional checks in my fork: https://github.com/fedy-cz/mergerfs-tools

mergerfs is awesome, thanks

trapexit commented 2 years ago

:+1:

If you think it is generally valuable and won't interfere with the general use cases then feel free to submit a PR.