trapexit / mergerfs

a featureful union filesystem
http://spawn.link
Other
4.29k stars 174 forks source link

filename / filepath policy branch filters #862

Open darthShadow opened 3 years ago

darthShadow commented 3 years ago

Is your feature request related to a problem? Please describe. Basically, I have 2 remotes merged together, one for smaller files like metadata (posters, artwork etc.) and one for the actual media. If I could specify maxfreespace of something like 100M on the smaller mount, then it could be used for the metadata files and since the media files are larger than 100M, they would go to the bigger mount

Describe the solution you'd like Add maxfreespace as a per branch option to facilitate the splitting of metadata & media files.

Describe alternatives you've considered N/A

Additional context N/A

trapexit commented 3 years ago

I think I misunderstood you when we spoke prior.

maxfreespace=100M as in the max file size? That's not something that is really possible. You don't know the file size before it's written. minfreespace is about how much space is available on a branch... not the files themselves. I'm not sure there is any practical way to do what I think you're looking for.

trapexit commented 3 years ago

I suppose a filename / filepath filter of some sort could be used. Some way to say "no files matching this name". Pretty niche and wouldn't be perfect.

darthShadow commented 3 years ago

Yeah, I think I explained it incorrectly earlier. Considering there is no way to know the file size (at least that I know of) before it's closed, then this feature won't help in my case.

Filtering would accomplish the required behaviour and that's what I am using right now with rclone (https://rclone.org/filtering/) but you are correct in that it's probably a very niche use-case and probably not worth the time it will take to implement.

Feel free to close the issue if that's the case.

Thanks.

trapexit commented 3 years ago

before it's closed

You can know the size of a file at any time but there isn't a "copy" call. It's an "open", "read/write", "close". And possibly a bunch of other stuff in-between. A policy is evaluated at the beginning of "open." If you're suggesting checking on close... that's a whole other ball of wax. While the file is being written to it's gotta go somewhere. RAM, a drive, somewhere. To do what you originally suggested would require storing the file temporarily somewhere and then, in effect, rerun the policy to select a location and then move the file. It's possible but it's not so simple because software might be sharing the location or whatnot which would have to carefully considered. Not impossible but awkward in the least.

Regarding filtering. I'm not opposed to adding path filtering to policies. I just want to make sure that would work for you and I will need to figure out the best way to do it. Adding new fields to the branches option needs to be carefully considered. I'm thinking of new ideas for defining it in 3.x but the current syntax isn't really designed for this kind of thing.

darthShadow commented 3 years ago

Ok, thanks for the explanation. Temp files are not my ideal solution too.

Yeah, the path filters would work for me but it's not urgent and can be tackled for 3.x as you say since I already have a similar solution with rclone working right now.

trapexit commented 3 years ago

What kind of filtering are you looking for? How elaborate? Would fnmatch patterns work? regex?

darthShadow commented 3 years ago

I would prefer glob-style matching as used by rclone since it seems like a good compromise between useability & advanced matching without needing to require full regex patterns.

Docs about rclone filtering: https://rclone.org/filtering/#pattern-syntax

trapexit commented 3 years ago

Hmm. I don't recognize that as a standard fnmatch glob. Will have to see what that is and if I can replicate it without needing to write my own matcher.

darthShadow commented 3 years ago

You can try out the filtering as a web-hosted version here: https://filterdemo.rclone.org

trapexit commented 3 years ago

I'm generally familiar with rclone and the filtering but after reading about it again I don't recognize it as something that is standard. So I'd have to use something else or write my own or find a compatible library. I'd prefer not have to write my own.

darthShadow commented 3 years ago

I have asked @ncw to chime in regarding whether it matches any standard matching pattern or whether it's completely custom. If it's custom or we can't find a matching library, then regex will also work for me, it will just increase the complexity of the filters themselves.

trapexit commented 3 years ago

I already asked him. I'll look at rsync and git's code. See if they are useful / performant enough for the usecase.