trapexit / mergerfs

a featureful union filesystem
http://spawn.link
Other
4.3k stars 174 forks source link

Not enough space when using path preserving policy #760

Closed DominicMe closed 4 years ago

DominicMe commented 4 years ago

General description

I have total of 10 disks with most being at or close to minfreespace limit of 50gb. I am using path preserving policy as outlined below because I want to minimize disk access which causes wake up and delay in data retrieval due to spin up.

Expected behavior

I was under the impression that when the drive with existing path is full even with path preserving policy new path would be created on a free disk. Is this not how it should behave?

Actual behavior

If I copy a file to directory x "disk2" is where it is copied despite it being at or below minfreespace limit of 50gb and disk0, disk1, disk3 and others having directory x as well as disk3 having 300gb+ free space. I am able to copy directly to disk3 so it should not be a permission issue. I also ran mergerfs.dedup and mergerfs.fsck scripts with no change.

Precise steps to reproduce the behavior

Copy file to an existing directory x with config below. Note I also tried epff and epall policies and some files will still get copied to disks that are over 50gb limit until they are completely full. What am I doing wrong?

System information

Please provide as much of the following information as possible:

LABEL=disk0 /mnt/disk0 ext4 defaults,nofail 0 2 LABEL=disk1 /mnt/disk1 ext4 defaults,nofail 0 2 LABEL=disk2 /mnt/disk2 ext4 defaults,nofail 0 2 LABEL=disk3 /mnt/disk3 ext4 defaults,nofail 0 2 LABEL=disk4 /mnt/disk4 ext4 defaults,nofail 0 2 LABEL=disk5 /mnt/disk5 ext4 defaults,nofail 0 2 LABEL=disk6 /mnt/disk6 ext4 defaults,nofail 0 2 LABEL=disk7 /mnt/disk7 ext4 defaults,nofail 0 2 LABEL=disk8 /mnt/disk8 ext4 defaults,nofail 0 2 LABEL=disk9 /mnt/disk9 ext4 defaults,nofail 0 2

/mnt/disk* /mnt/pool fuse.mergerfs defaults,allow_other,direct_io,use_ino,fsname=mergerfs,minfreespace=50G,category.create=eplfs,moveonenospc=true 0 0

/mnt/disk* /mnt/pool fuse.mergerfs defaults,allow_other,direct_io,use_ino,fsname=mergerfs,minfreespace=50G,category.create=eplfs,moveonenospc=true 0 0

* [  ] Linux version: `uname -a`
Linux Server 5.6.13-200.fc31.x86_64 #1 SMP Thu May 14 23:26:14 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

* [ ] Versions of any additional software being used
* [ ] List of drives, filesystems, & sizes:
[dom@Server ~]$ df -h

Filesystem Size Used Avail Use% Mounted on devtmpfs 3.4G 0 3.4G 0% /dev tmpfs 3.4G 20K 3.4G 1% /dev/shm tmpfs 3.4G 1.5M 3.4G 1% /run /dev/nvme0n1p3 901G 84G 771G 10% / tmpfs 3.4G 44K 3.4G 1% /tmp /dev/nvme0n1p2 488M 220M 233M 49% /boot mergerfs 105T 97T 8.1T 93% /mnt/pool backing 105T 97T 8.1T 93% /mnt/backing /dev/nvme0n1p1 256M 8.4M 248M 4% /boot/efi /dev/sda1 962G 702G 261G 73% /mnt/downloads /dev/sdi1 7.3T 7.2T 61G 100% /mnt/disk3 /dev/sdh1 11T 11T 23G 100% /mnt/disk1 /dev/sdf1 7.3T 7.2T 54G 100% /mnt/disk9 /dev/sdk1 11T 11T 51G 100% /mnt/disk2 /dev/sdc1 15T 14T 639G 96% /mnt/disk4 /dev/sdj1 13T 13T 277G 98% /mnt/disk0 /dev/sdd1 15T 7.7T 6.8T 54% /mnt/disk6 tmpfs 694M 0 694M 0% /run/user/1000 tmpfs 694M 0 694M 0% /run/user/420 /dev/sdg1 7.3T 7.2T 48G 100% /mnt/disk8 /dev/sdb1 13T 13T 108G 100% /mnt/disk5 /dev/sde1 7.3T 7.2T 49G 100% /mnt/disk7

trapexit commented 4 years ago

Ideally you could even add more than 2 policies so it would continue to the next one if previous resulted in an error.

Unless you're imagining new policies there wouldn't ever be any need for that. If an ep policy failed so too would any other ep policy. A non ep policy will always succeed unless minfreespace for all drives are hit or due to RO/NC.

DominicMe commented 4 years ago

Yes. But like I said the alternative is a lot more work. Hence trying to understand what the actual need is and not just taking implementation suggestions.

Maybe just having all ep* policy variations with the fallback option work then? lots of extra policies but no limitations.

Path preservation means paths are preserved. If they aren't preserved it's not path preservation. What you want is something different.

Don't get hung up on my terminology, its not important what it ends up being called at least not for me. I would call it "path-preserving with fallback" or something similar because it attempts to path preserve if possible but if not falls back to non-pp.

What you want will happen now'ish with ff or lfs. There is more to your ask than just what you described. Maybe you don't see it that way but there is. There is a big difference in end behavior between having a fallback non-pp policy attached to a pp policy and the walk back strategy that @malventano suggests.

I am not familiar with the internals which is why I wanted to avoid making specific suggestions, I dont know what works best programatically. You are in the best position to make these decisions. I think you know what functionality I want, how it gets implemented is not important to me, I am not claiming my suggestion is best or even technically possible.

That's what I was talking about. A Domain Specific Language. That's non-trivial. And am I really going to expect users to become programmers? Do people really want to make everything slower so they can make their own in Lua or something? Having a fallback isn't "building your own". It's an option to an existing policy or it's an option to mergerfs where on error it tries another. Either way it's drastically different from "make-your-own".

I don't understand how my example of "ep+lus" or even "ep+lus+lfs" is so much more complicated to the end user than just "eplus". You still have to understand what ep and lus and lfs means to use. Maybe code is complex, I don't know, but to the end user it's not significantly more complicated. I feel like you assume that I know how mergerfs internals work, I don't, I have no clue if certain feature or change will make mergerfs slower or not, I can only make suggestions.

my TV shows are always 4 levels deep or movies 2 or music 3 so I want to keep music limited to these 2 drives and tv to those 4, etc.

Thats the sort of thing I want to achieve, keep related files in same dir. It doesn't have to be perfect, if few files get split up in exchange for not needing manual intervention thats an acceptable compromise for me. This can also be fixed by running a consolidation script periodically. The bottom/deepest directories are the one that are most important to try to consolidate. Files are also more important to keep consolidated as opposed to dirs.

Some people do this already by using different mkdir policies but that's not the same as what a walk back does.

What do you mean by "mkdir policies"? isn't mkdir part of create category? I though you can only apply a policy to action, create and search categories..?

trapexit commented 4 years ago

Don't get hung up on my terminology, its not important what it ends up being called at least not for me. I would call it "path-preserving with fallback" or something similar because it attempts to path preserve if possible but if not falls back to non-pp.

It's not about the name. It's about me understanding what you mean. If you tell me a story about a dog and you mean a cat and then tell me to ignore the fact you're calling it a dog... it's just confusing. That's why I'm asking for behavior. Not proper nouns. Not implementations.

I am not familiar with the internals which is why I wanted to avoid making specific suggestions, I dont know what works best programatically. You are in the best position to make these decisions. I think you know what functionality I want, how it gets implemented is not important to me, I am not claiming my suggestion is best or even technically possible.

But you are making specific suggestions which is what I'm saying is a problem. I just want to understand the algo you want.

I think you know what functionality I want, how it gets implemented is not important to me, I am not claiming my suggestion is best or even technically possible.

I know what you've suggested. It's not clear to me what you want specifically.

I don't understand how my example of "ep+lus" or even "ep+lus+lfs" is so much more complicated to the end user than just "eplus". You still have to understand what ep and lus and lfs means to use. Maybe code is complex, I don't know, but to the end user it's not significantly more complicated. I feel like you assume that I know how mergerfs internals work, I don't, I have no clue if certain feature or change will make mergerfs slower or not, I can only make suggestions.

You aren't understanding me. I didn't say that it was more complicated for the user. I said that it's more complicated to code vs bespoke policies.

"ep+lus" is NOT "make-your-own". It is a a chaining of policies. The policies are fixed function. "make-your-own" would strongly imply building your own. Meaning writing code. Writing logic to sort and filter paths.

Again, I'm not asking you to understand the internals. I'm not asking for suggestions. I'm asking you for your use case.

Thats the sort of thing I want to achieve, keep related files in same dir. It doesn't have to be perfect, if few files get split up in exchange for not needing manual intervention thats an acceptable compromise for me. This can also be fixed by running a consolidation script periodically.

Then what's wrong with lfs or ff? Walk back requires manual intervention. If you want to arbitrarily control what stuff lives where there has to be some. As I said: walk back and what you've suggested are not the same thing. They have different behaviors and imply different things. Your suggestion has no manual steps outside config. walk back has manual intervention when the walk back depth is maxed out across the drives.

The bottom/deepest directories are the one that are most important to try to consolidate. Files are also more important to keep consolidated as opposed to dirs.

But that's not everything you're concerned with. You've said numerous times that spinup is important to you. Keeping files consolidated on random drives doesn't do that necessarily. It depends on your access patterns. Then again as I've mentioned mergerfs has to query all drives for information (depending on the function call and policy) so drives may need to spin up if their data isn't cached anyway.

What do you mean by "mkdir policies"?

I mean the policy for mkdir. I'm not sure how else to say it. Every function has a policy. mkdir is a function. Therefore mkdir has a policy.

https://github.com/trapexit/mergerfs#terminology

isn't mkdir part of create category?

Yes.... it is.

https://github.com/trapexit/mergerfs#functions--policies--categories

I though you can only apply a policy to action, create and search categories..?

No. Why would you think that? It explicitly says in multiple locations in the docs that categories are collections of functions. It's a convenience to put them in categories because most people tend to use the same policies across those generally similar acting functions.

From the section on the topic: "The POSIX filesystem API is made up of a number of functions. creat, stat, chown, etc. In mergerfs most of the core functions are grouped into 3 categories: action, create, and search. These functions and categories can be assigned a policy which dictates what file or directory is chosen when performing that behavior. Any policy can be assigned to a function or category though some may not be very useful in practice. For instance: rand (random) may be useful for file creation (create) but could lead to very odd behavior if used for chmod if there were more than one copy of the file."

From options: "func.FUNC=POLICY: Sets the specific FUSE function's policy. See below for the list of value types. Example: func.getattr=newest"

https://github.com/trapexit/mergerfs#tips--notes https://github.com/trapexit/mergerfs#plex-doesnt-work-with-mergerfs https://github.com/trapexit/mergerfs#what-policies-should-i-use https://github.com/trapexit/mergerfs#why-are-all-my-files-ending-up-on-1-drive