restic / restic

Fast, secure, efficient backup program
https://restic.net
BSD 2-Clause "Simplified" License
25.16k stars 1.51k forks source link

Would like to be able to exclude by size #2569

Closed trevor-vaughan closed 3 years ago

trevor-vaughan commented 4 years ago

Output of restic version

restic 0.9.5 compiled with go1.13beta1 on linux/amd64

What should restic do differently? Which functionality do you think we should add?

I would like to be able to exclude files based on a maximum size.

What are you trying to do?

I would like to perform user home directory backups with some administrative level of control so that items like ISO files, or large tarballs, are excluded.

As a suggestion, something like --exclude-max-size might work and would put it within an easy search of exclude flags in the documentation.

Did restic help you today? Did it make you happy in any way?

So far, it's the most intuitive general purpose backup tool that I've used 👍

rawtaz commented 4 years ago

I wonder how many has this use case. I also wonder how well it works.. Where do you set the limit? 400 MB? Then you'll get some Linux distro at 389 MB included. It's probably hard to make this feature useful in practice. Thougths?

dimejo commented 4 years ago

I wonder how many has this use case. I also wonder how well it works.. Where do you set the limit? 400 MB? Then you'll get some Linux distro at 389 MB included. It's probably hard to make this feature useful in practice. Thougths?

Let's assume your repository backend has a fixed quota of 10GB. You regularly backup your /home folder to this repository which uses ~7GB most of the time. Someday you temporarily download a 5GB ISO DVD to try it out, and this ISO is accidentally backed up as well. You have now run out of space on your repository but can't clean it up because removing anything (i.e. pruning) requires to add content which is not possible because you reached your quota. Now you are left with 3 possibilities:

  1. upgrade the storage
  2. delete the repository and start from scratch
  3. move the repository to a location with a bigger quota (e.g. local disk), clean it up and re-upload everything

IMHO this is a useful feature because restic currently doesn't check any quota (and on some backends will never be able to). There already are some threads on the forum about people running out of storage for their repository and were unable to remove and prune anything afterwards.

Something to think about: Should restic by default warn about files being excluded that way?

rawtaz commented 4 years ago

@dimejo What you write is true. But how would you tune the max size setting? If you set it to 5 GB you can be darn sure that the next day you download a 4.5 GB file which is included. If you set it to 4 GB you can be damn sure that you'll fill up your repository with a couple of 3.8 GB files.

My point is that the setting we're discussing is too narrow to be of much practical use, unless you have a pretty stable pattern for how large files you store and know pretty well what you might want to exclude in terms of size. Temporarily downloaded files that aren't part of a routine or pattern like that will be hard to make a good setting for.

I put temporary files that I know I don't care about and need backed up in a _temp folder in my Downloads folder. Why not do the same instead of blindly trying to figure out a good value for a setting that will never match the real world use cases (unless you have a pattern/routine for such files)..

I'm not saying it's a bad feature though, I wouldn't mind if it's implemented. As long as there's reasonable demand for it.

dimejo commented 4 years ago

@rawtaz:

What you write is true. But how would you tune the max size setting?

Deciding which size makes sense is always up to the user. What works for one person might not work for another.

My point is that the setting we're discussing is too narrow to be of much practical use, unless you have a pretty stable pattern for how large files you store and know pretty well what you might want to exclude in terms of size.

If I'm backing up a directory which only contains (normal) pictures I can safely assume a maximum file size of 500MB.

I put temporary files that I know I don't care about and need backed up in a _temp folder in my Downloads folder. Why not do the same instead of blindly trying to figure out a good value for a setting that will never match the real world use cases (unless you have a pattern/routine for such files)..

As I have seen on the forum in the past there are a lot of different ways to use restic. Some of them do not make sense to me but they seem to work for someone else.

But please let's not restrict this discussion to my example. I just tried to provide 1 possible use case for this feature.

I'm not saying it's a bad feature though, I wouldn't mind if it's implemented. As long as there's reasonable demand for it.

I know that every feature should be considered carefully because it needs maintenance which in return stresses maintainers. But I doubt that such a feature requires much maintenance once implemented.

trevor-vaughan commented 4 years ago

Well, my use case is that I work with upstream ISO distributions and container base images quite often and those are...chunky. Depending on what I'm doing, these may actually be smattered across various places in my workspace.

While good system hygiene is ideal, I'm not sure how reality-based it is because people are people. Systems in general have apps that just drop garbage everywhere and it's very difficult for users to know what is, and isn't important. But I agree with @dimejo in that I really don't want my backup system to get overwhelmed. It would be nice if restic could output a report that told me what it backed up and what it skipped, along with simple file metadata (size, etc...) so that I could figure out if I missed something that I might want to have (perhaps it's an option that I've simply missed).

In my case, I'd like to be able to say to avoid things over 500M as well since 90% of the time I don't care about those. If I do, they're things like system demo videos and I can back those up separately because they are in a single spot.

Looking back at the venerable rsync for inspiration:

I totally get that I could do the classic find <stuff> > include.txt but that feels like it's defeating the purpose of such an efficient backup tool!

rawtaz commented 4 years ago

I just realized that --exclude-max-size=500M is a bad name for such a parameter. It could mean a lot of things, e.g.:

It's probably better to name it just --max-file-size. I'm suggesting this instead of rsync's max-size because that too is ambiguous - it could mean that the backup should not be allowed to be more than 500 MB large (whatever sense that would make).

I think it's wise to be explicit and unambiguous when deciding on things like flags that should exist for many years to come without colliding with other options. Naming is very important.

All of the above of course goes for the "min" version too.

giordy commented 3 years ago

I think indeed that this would be a very sane and useful option to include in Restic.

I'm using Restic to backup my home directory and the average file that I want to backup is always below 100MB, mostly documents, pictures and source code files. Files bigger than that are always either videos, ISOs or large archives that I don't want to backup.

Just the other day a 10GB VirtualBox VM I was testing got backed up because I forgot to add the directory in the exclude list and I would have gladly avoided that. Besides using space it also clogs the internet connection for hours, which is something that I definitely would like to avoid.

YoshieraHuang commented 3 years ago

I implemented this feature in my fork. You can try this.

MichaelEischer commented 3 years ago

VCS based includes are already tracked in #1514, so the only remaining option suggested by @trevor-vaughan is

--min-size=SIZE => Restic does not have this, possibly not a bad idea for the same reason stated in the rsync man page (small junk files)

As that option has been mostly glossed over, I'm not sure what a potential use case would be. To me it look like it would make it way too easy to accidentally exclude important config files from your backup.

rawtaz commented 3 years ago

I think that at some point one has to realize that one might not always be able to have 100% perfect backups that contain only what one knows down the line one need. In other words, small junk files are perhaps that not big of an issue, they'll just be there and not cause harm to anyone.

trevor-vaughan commented 3 years ago

The --min-size option was more for completeness since it was important enough for rsync to pull it in.

The main use case I see for it is when you're backing up something like a video directory and want to make sure that backup files or .part files are skipped.

Nowhere near as critical as the max-size option though.

MichaelEischer commented 3 years ago

Closing this issue for now as there have been no additional request for a --min-size flag. Feel free to comment if you still want that feature.

phlummox commented 1 year ago

Hi, I was just trying to work out the status of this issue. From what I can tell, it looks like this was closed as "wontfix" (here: https://github.com/restic/restic/issues/2569#issuecomment-748473023) on 19 Dec 2020 – but wasn't an implementation of the requested feature already merged on 19 Sept 2020 (here: https://github.com/restic/restic/pull/2914#event-3785086749)? Apologies if I'm missing something.

MichaelEischer commented 1 year ago

Support for --exclude-larger-than has been added. And regarding a minimal file size, we haven't seen a good use case for that.

phlummox commented 1 year ago

Ah, gotcha. My apologies -- I didn't notice that the original request was for a minimal file size as well. Thanks!