leuchtraketen / fdupes

fdupes is a program for identifying duplicate files residing within specified directories
48 stars 8 forks source link

fdupes misbehaves when hard link limit is reached #1

Open jlherren opened 10 years ago

jlherren commented 10 years ago

I have a folder structure with many duplicate files. I used "fdupes -L" to hard link identical files, but this failed with many "Too many links" errors. I have a few files that exist more than 65000 times and that is the maximum number of links that a file can have under ext4. Other filesystems might have other limits.

The real problem is that fdupes appears to delete the original file and then attempts to create a hard link with the same name. If that fails, you will end up with no file at that location at all. Here's a simple test to verify this:

$ mkdir test; cd test
$ for X in `seq 1 100000`; do touch $X; done
$ ls | wc -l
100000
$ fdupes -L .
[h] ./1
[h] ./2
[...]
[!] ./74190 -- unable to create a hardlink for the file: Too many links
[!] ./74491 -- unable to create a hardlink for the file: Too many links
[...]
$ ls | wc -l
65000

So it "eats" some of the files. That's very bad behavior. Maybe fdupes could first rename the files and restore them in case the hard linking fails, and delete them when it succeeded. Ideally it could hard link files in bunches of 65000.

NOTE: The test above assumes a filesystem with no 32000 entries-per-directory limit. Even with such a limit, the problem persists, since the duplicates could be scattered in subfolders.