sahib / rmlint

Extremely fast tool to remove duplicates and other lint from your filesystem
http://rmlint.rtfd.org
GNU General Public License v3.0
1.86k stars 128 forks source link

"ln" hardlink sometimes failing with "invalid argument" - NTFS, W10, WSL(1) #566

Closed james-cook closed 2 years ago

james-cook commented 2 years ago

This is probably a glaringly obvious error on my side, but i could use some hints.

rmlint is a recent dev version (2 weeks or so old):

version 2.10.1 compiled: Mar 13 2022 at [10:25:11] "Ludicrous Lemur" (rev bdb591f4)
compiled with: +mounts +nonstripped +fiemap +sha512 +bigfiles +intl +xattr +btrfs-support

Underlying filesystem is NTFS, running in WSL(1) on W10.

The initial run was:

sudo rmlint --progress \
       -c sh:hardlink \
       --xattr \
       -T "df -emptyfiles -emptydirs -dd" \
       -S mpda \
       '/mnt/d/A/20160505-MRd' \
       '/mnt/d/A/20160622-WIB' \
       '/mnt/d/A/20160622-MR' \
       '/mnt/d/A/20200509-MR/'  \
       '/mnt/d/A/20210705-MR/' \
       '/mnt/d/C/' \
       '/mnt/d/C2/' 

(all files are on the same device)

Running the script from the above run "-pxd" - looks fine but I have noticed that the hardlinking sometimes fails and returns an error:

Hardlinking to original: /mnt/d/A/20200509-MR/Users/xxx/WfW2/node_modules/path-is-absolute/license
ln: failed to create hard link '/mnt/d/A/20200509-MR/Users/xxx/WfW2/node_modules/path-is-absolute/license' => '/mnt/d/A/20160505-MRd/Users/xxx/AppData/Roaming/npm-cache/ansi-regex/2.0.0/package/license': Invalid argument

If I run ls, stat and file on these files I get: ls:

$ ls -ali '/mnt/d/A/20210705-MR/Users/xxx/WfW2/node_modules/path-is-absolute/license'
281474980796376 -rwxrwxrwx 1 xxx xxx 1119 Sep 16  2017 /mnt/d/A/20210705-MR/Users/xxx/WfW2/node_modules/path-is-absolute/license
$ ls -ali '/mnt/d/A/20160505-MRd/Users/xxx/AppData/Roaming/npm-cache/ansi-regex/2.0.0/package/license'
562949954086350 -rwxrwxrwx 1024 xxx xxx 1119 Sep  1  2015 /mnt/d/A/20160505-MRd/Users/xxx/AppData/Roaming/npm-cache/ansi-regex/2.0.0/package/license

stat:

$ stat '/mnt/d/A/20210705-MR/Users/xxx/WfW2/node_modules/path-is-absolute/license'
  File: /mnt/d/A/20210705-MR/Users/xxx/WfW2/node_modules/path-is-absolute/license
  Size: 1119            Blocks: 8          IO Block: 512    regular file
Device: fh/15d  Inode: 281474980796376  Links: 1
Access: (0777/-rwxrwxrwx)  Uid: ( 1000/ michael)   Gid: ( 1000/ michael)
Access: 2022-03-28 22:47:38.027165400 +0100
Modify: 2017-09-16 19:05:51.774167700 +0100
Change: 2022-03-28 22:47:38.065161100 +0100
 Birth: -
$ stat '/mnt/d/A/20160505-MRd/Users/xxx/AppData/Roaming/npm-cache/ansi-regex/2.0.0/package/license'
  File: /mnt/d/A/20160505-MRd/Users/xxx/AppData/Roaming/npm-cache/ansi-regex/2.0.0/package/license
  Size: 1119            Blocks: 8          IO Block: 512    regular file
Device: fh/15d  Inode: 562949954086350  Links: 1024
Access: (0777/-rwxrwxrwx)  Uid: ( 1000/ michael)   Gid: ( 1000/ michael)
Access: 2022-03-28 23:07:04.212449600 +0100
Modify: 2015-09-01 21:59:44.000000000 +0100
Change: 2022-03-22 09:07:08.459610600 +0000
 Birth: -

file:

$ file '/mnt/d/A/20210705-MR/Users/xxx/WfW2/node_modules/path-is-absolute/license'
/mnt/d/A/20210705-MR/Users/xxx/WfW2/node_modules/path-is-absolute/license: ASCII text
$ file '/mnt/d/A/20160505-MRd/Users/xxx/AppData/Roaming/npm-cache/ansi-regex/2.0.0/package/license'
/mnt/d/A/20160505-MRd/Users/xxx/AppData/Roaming/npm-cache/ansi-regex/2.0.0/package/license: ASCII text
cebtenzzre commented 2 years ago

One of those files shows "Links: 1024" (1023 hardlinks if you consider one the original). From the Win32 docs: https://docs.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-createhardlinka

The maximum number of hard links that can be created with this function is 1023 per file. If more than 1023 links are created for a file, an error results.

I don't know whether this is a limitation of Windows or NTFS. Do you think this should be better documented, or that the script should try to detect this situation and provide a more specific error message?

james-cook commented 2 years ago

That's a great find! I mentioned the underlying fs as I thought this might be system specific.

The error message "invalid argument" is the same on the command line... not very useful but not the fault of rmlint.

If rmlint could check the count and make a suggestion that would be excellent.

james-cook commented 2 years ago

The/one suggestion could be to move to WSL2 and ext4. ext4 has a limit of 65000. (https://unix.stackexchange.com/questions/5629/is-there-a-limit-of-hardlinks-for-one-file)

james-cook commented 2 years ago

I haven't had a chance to move to WSL2 yet - when I do I'll see if mounting ext4 etc. makes a difference (should do I'd say) So, closing this - I will report back when I get WSL2/ext4 up...

For now, for windows/NTFS - cp_hardlink could be speeded up (in rmlint.sh) by checking the inode count before doing anything else.

Pseudo code:

   number_of_links=$(stat -c %h '<filename>' ) 
   if $number_of_links == 1024 then skip hardlinking else continue

I mostly run rmlint.sh with -p so the idea is to stop resource heavy compares if no hardlink can be made anyway

cebtenzzre commented 2 years ago

@james-cook AFAIK there is no hardlink limit on WSL2 with ext4 since it's a virtual machine running the actual Linux kernel. I tested ntfs-3g and it does not have this limitation, but I am still curious whether Ext2Fsd/Ext4Fsd or OpenZFS on Windows does - if not, it's just the Windows NTFS driver in the way.