trapexit / mergerfs

a featureful union filesystem
http://spawn.link
Other
4.21k stars 170 forks source link

Add support for "cp --reflink" #1296

Closed BujiasBitwise closed 7 months ago

BujiasBitwise commented 8 months ago

Is your feature request related to a problem? Please describe. I'm using xfs with reflink support. I'd like this feature so I can "cp --reflink -R" and create a backup history. Basically file snapshots in xfs. I'd like reflinks instead of the full copies made by link_cow, so I can rsync only the changed blocks in system images and save space.

It says "Operation not supported". I guess it also doesn't work with zfs/btrfs, since the BTRFS_IOC_CLONE or FICLONE ioctl is returning EOPNOTSUPP. Reflinks are correctly enabled in my xfs filesystems.

This is not related to the feature in the TODO to automatically convert hardlinks into reflinks.

Describe the solution you'd like The same way hardlinks are supported, I'd like "cp --reflink" to work with mergerfs and at least xfs. Otherwise I'll have to run it on the individual disks used by mergerfs.

Describe alternatives you've considered I thought I could use link_cow as a workaround, because trapexit said on reddit that the copy it makes to break the hardlinks will use reflink if the underlying filesystem supports them. However that's not the case in xfs. It makes a full copy of the files.

If that's not the behaviour in btrfs/zfs, maybe this is a bug/oversight and requires a different issue.

trapexit commented 8 months ago

FICLONE got disabled when I had to blacklist all BTRFS ioctl calls a while back due to them being incompatible with how ioctl works via FUSE and due to my open policy prior it could cause crashes. I can whitelist that one call because that is safe but it was just an oversight at the time. I've been meaning to go through a bunch of popular ioctl calls to see if I can support them.

As for ficlone not working with link_cow... are you positive? I just tried it and it works fine. From a trace of mergerfs with a linked file opened for write.

[pid 2339892] newfstatat(AT_FDCWD, "/tmp/xfs/foo", {st_mode=S_IFREG|0644, st_size=1048576, ...}, AT_SYMLINK_NOFOLLOW) = 0                                     
[pid 2339892] openat(AT_FDCWD, "/tmp/xfs/foo", O_RDONLY|O_NOFOLLOW) = 6                                                                                       
[pid 2339892] openat(AT_FDCWD, "/tmp/xfs/.MFtGOyCkOTYiEnZl", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0200) = 7                                                       
[pid 2339892] newfstatat(6, "", {st_mode=S_IFREG|0644, st_size=1048576, ...}, AT_EMPTY_PATH) = 0                                                              
[pid 2339892] ftruncate(7, 1048576)     = 0                                                                                                                   
[pid 2339892] ioctl(7, BTRFS_IOC_CLONE or FICLONE, 6) = 0                                                                                                     
[pid 2339892] ioctl(6, FS_IOC_GETFLAGS, [0]) = 0                                                                                                              
[pid 2339892] ioctl(7, FS_IOC_SETFLAGS, [0]) = 0                                                                                                              
[pid 2339892] flistxattr(6, NULL, 0)    = 0                                                                                                                   
[pid 2339892] futex(0x7fa8c1a29788, FUTEX_WAKE_PRIVATE, 2147483647) = 0                                                                                       
[pid 2339892] fchown(7, 0, 0)           = 0                                                                                                                   
[pid 2339892] fchmod(7, 0100644)        = 0                                                                                                                   
[pid 2339892] utimensat(7, NULL, [{tv_sec=1706395088, tv_nsec=16974695} /* 2024-01-27T16:38:08.016974695-0600 */, {tv_sec=1706395084, tv_nsec=408948479} /* 20
24-01-27T16:38:04.408948479-0600 */], 0) = 0                                                                                                                  
[pid 2339892] rename("/tmp/xfs/.MFtGOyCkOTYiEnZl", "/tmp/xfs/foo") = 0                                                                                        
[pid 2339892] close(6)                  = 0                                                                                                                   
[pid 2339892] close(7)                  = 0                                                            

I could reorder the ftruncate to be after the ficlone as I don't think it is necessary but afaict it is working as expected.

trapexit commented 8 months ago

Oh, right. I forgot that FUSE does not support FICLONE. I don't even get the request from the kernel. As I understand the kernel calls the "file_operation" call clone_file_range when FICLONE is issued... which FUSE does not define.

Until the kernel supports this FICLONE and FICLONERANGE just aren't going to work. I can fire off an email to the FUSE mailing list and see what they think.

Also... if the kernel just gave me the source file descriptor it wouldn't be sufficient. That value comes from the client app. It would have no meaning to mergerfs. Maybe I could work around it by looking up the FD in /proc of the calling app. Not elegant might work.

BujiasBitwise commented 8 months ago

I confirm that FICLONE is working with link_cow just fine, and my trace looks like yours. The problem in my previous test was that I truncated the file by mistake, and when I saw there were no shared extents I thought they hadn't been reflinked.

Regarding FICLONE, it's unfortunate FUSE doesn't support it. Maybe they have a good reason for that. Reflinks are a valuable feature, so it would be nice if they worked.

Only you can determine if going through the effort of ugly hacks is worth it. At least people can now take a look at this issue to see why it's not implemented. Or maybe add a simple comment to the readme, where you mention hardlink support.

I'll use link_cow for now, because I don't need hardlinks in my use case. Or I may just create them out of band, since it's much faster (0.5s vs 12s for 52k files)

Thanks for looking into this.

chapmanjacobd commented 4 months ago

I've also needed this at times and I wrote a small wrapper around cp to resolve paths to the original drive. Might not fit every use-case but thought that I would share:

$ pip install xklb xattrs
$ lb mergerfs-cp --dry-run d/files* d/folder2/
cp --interactive --reflink=always /mnt/d9/files1.txt /mnt/d9/folder2/files1.txt
cp --interactive --reflink=always /mnt/d3/files1.txt /mnt/d3/folder2/files1.txt
...
$ btrfs fi du /mnt/d3/files1.txt /mnt/d3/folder2/files1.txt
     Total   Exclusive  Set shared  Filename
  12.57GiB       0.00B    12.57GiB  /mnt/d3/files1.txt
  12.57GiB       0.00B    12.57GiB  /mnt/d3/folder2/files1.txt