trapexit / mergerfs-tools

Optional tools to help manage data in a mergerfs pool
ISC License
372 stars 42 forks source link

[Feature request] Support "balance" with hard links #135

Closed hilts-vaughan closed 9 months ago

hilts-vaughan commented 1 year ago

Not sure if mergerfs can solve this problem but here goes nothing:

I have two disks and one of them recently filled up that snapshots my main NAS -- so I added another to the set. I needed to offload some of the files onto the other disk so I wanted to run a balance. It did start to move some of the files -- but because it is only moving them from "daily.0" by default it is not actually freeing up the space. It merely copies them.

It also does not appear to be preserving the links.

Is there a way to balance on hard linked rsnapshot drives?

trapexit commented 1 year ago

It is possible but requires a full rewrite. There is no reverse lookup for inodes. The whole system would have to be scanned and then files moved vs the current code that just looks for a file and moves it as found.

hilts-vaughan commented 1 year ago

That sounds like way more work than I'd feel comfortable asking about for a niche case.

If I understand the way it works, would the following work?

  1. Delete every other hardlinked snapshot folder and only leave "daily.0"
  2. Run balance, which will now run deletes after moving to the new destination and removing the only remaining hard link, this freeing the space

Hopefully this combined with mfs will make rsnapshot work. I'm not sure if the hard linking across devices works.

trapexit commented 1 year ago

It's not an unreasonable feature request but the truth is this script was intentionally simple. I never intended it to be something people use regularly. Just an example of what can be done. mergerfs wasn't designed to work like Drive Pool which does placement after the fact. mergerfs does it on creation. This script was just for people who meant to use mfs but used epmfs instead.

I'd just run the balancer or move files by hand then run a dedup tool that will replace with links.

I'm not sure if the hard linking across devices works.

No, they don't. A link is just another reference to a file. File names are attached to inodes. Inodes are files. Not the names. You can have many names for the same inode/file. So you can't have a name that is referencing a inode from a totally different filesystem.

hilts-vaughan commented 1 year ago

No, they don't. A link is just another reference to a file. File names are attached to inodes. Inodes are files. Not the names. You can have many names for the same inode/file. So you can't have a name that is referencing a inode from a totally different filesystem

Right, so this will result in way more space than needed used at times. It makes sense that it wouldn't work though. My system is mostly append only, so this may not be an issue.

trapexit commented 1 year ago

Why would it use more space? Only before the dedup.

You run dedupe on mergerfs. The dedup tool finds all the duplicate files and replaces all but one with a link. You use a policy that doesn't restrict creation to branches with existing paths.

hilts-vaughan commented 1 year ago

Why would it use more space? Only before the dedup.

You run dedupe on mergerfs. The dedup tool finds all the duplicate files and replaces all but one with a link. You use a policy that doesn't restrict creation to branches with existing paths.

It may be a misunderstanding but I'm talking about in a post balance world. Consider the following:

Disk A has 10GB free Disk B has 11GB free

Disk A now contains /daily.0/foo. Disk B does not have a copy.

rsnapshot will notice /daily.1/foo does not yet exist yet (bur does exist in daily.0) and attempt to make a hard link. Does the filesystem entry end up on B or A?

If it's B, then I suppose there is no problem but I figured since it's a new file entry under MFS it would be on B, and be a full copy of the file. Actually, reading this back to myself now it's a link syscall.. so it's never going to copy a file and it will either make a link or fail.. :)

trapexit commented 1 year ago

links ONLY can exist on the same device. Period.

trapexit commented 1 year ago

https://github.com/trapexit/mergerfs#rename--link

The link will fail if the policy doesn't allow path creation. If you use mfs that isn't a problem. The literal only possible option for a link call to succeed if the destination path doesn't exist is to create it or for it already to exist and then to call link(src,dst). There is no other possible outcome besides error. Running dedup on mergerfs is perfectly normal and assuming non-path preserving policy for create will work exactly as intended.

trapexit commented 1 year ago

You can easily test this by running rsnapshot tool targeting the pool in some random directory and then calling

rdfind -makehardlinks true /mergerfs/path/

elboletaire commented 9 months ago

It is possible but requires a full rewrite. There is no reverse lookup for inodes. The whole system would have to be scanned and then files moved vs the current code that just looks for a file and moves it as found.

What about adding a hardlinks-root-folder param to the combine cli? Most of us have all the hard links in the same location, because we're using *arr tools, downloading everything to a single folder (the "hardlinks root folder").

When specified, the combine should just check if the file to be moved exists linked in that "hardlinks root folder", and move it along with the original file.

trapexit commented 9 months ago

I don't really understand your request. To handle links you have to scrape the whole filesystem regardless. There is no need or point to having it scan a subsection of the filesystem. And doing a full scan is really a whole new tool. I'm already considering building a rebalancing tool as part of mergerfs to mimic behavior from DrivePool and unraid. I'm really not sure the usecases people have for such things but I guess people want it. So I don't plan on modifying this tool given it was originally intended just for people who screwed up their initial setup and left epmfs enabled when they didn't want it. Not to be some regularly used tool.