memory usage when backing up directories containing many empty files

pabs3 commented 5 years ago

Output of `restic version`

restic 0.9.5 compiled with go1.12.9 on linux/amd64

How did you run restic exactly?

See the attached test script, which I ran like this:

$ ./test 1000000

The script creates a directory containing a million empty files and then backs it up using restic. On an ext4 filesystem the directory is 22MB in size.

What backend/server/service did you use to store the repository?

Filesystem

Expected behaviour

Lower memory usage.

Actual behaviour

Higher memory usage, see the attached heap profiles with GOGC unset and GOGC=1 taken towards the end of the restic backup process. The peak RAM usage with GOGC unset is 2.8 GB.

With real-world workloads (such as large Maildirs containing more files), the RAM usage can exceed the amount of RAM on the system, leading to swap death.

Steps to reproduce the behaviour

See the attached test script.

Do you have any idea what may have caused this?

I am guessing that restic prepares directory indexes in RAM instead of preparing them on-disk. The issue could also be inefficiently stored directory indexes.

Do you have an idea how to solve the issue?

I don't know enough about restic's internals nor Golang memory profiling.

Did restic help you or made you happy in any way?

I was happy to find a good replacement for rdiff-backup.

shtratos commented 4 years ago

Thanks for the profiles. I've made some visualisations of used memory and number of objects: profile008 profile007

IMO it looks like most of the memory is consumed by the tree object for the directory. Since the directory has 1M files inside, its tree object is huge (236MB) and it also needs to be serialised as JSON (275MB) and encrypted (another 275MB). The rest is mostly Restic bookkeeping and file paths.

I guess the fix for that would be to process files inside a directory in batches and split the directory's tree object in several pages.

Not sure how common the case of 1M files in one directory is, though.

pabs3 commented 4 years ago

The reason I posted this issue is because I have a Maildir where there are 602937 messages and restic was OOMing every time I tried to backup my system until I turned on the GOGC=1 environment variable, then it started working again. This was even with lots of swap and compressing a bunch of my RAM with zram. I do need to migrate away from that mail account so the Maildir doesn't grow any more and I can split it by year, but that could take a while, so I'm hoping this particular memory usage issue can get fixed before restic reaches my memory limits again.

rawtaz commented 4 years ago

Nice input people. I like the graph :)

Just to those who hasn't thought about it; If you have this problem with restic taking too much of your memory for a large backup set, know that you can run separate backups. E.g. three backups with one third of e.g. a Maildir each. You can back it up to the same repo, and deduplication will still be used. The main difference is that you'll have different paths in your snapshots, but you can give all the backups a single tag and then group on that tag when forgetting and pruning. It should be fine.

aawsome commented 4 years ago

@pabs3: You can also try the lowmem approach of #2523 for backing up your maildir. I guess some part of your memory usage is due to the index. I would be happy to see some results of your tests!

Note that the approaches in #2523 do not tackle the problem of many files per directory and having the resulting tree directly and in json-format in memory, though.

aawsome commented 4 years ago

About having really large directories, I think the problem is the following: A node (see internal/restic/node.go) can just have one subtree. Therefore if a node is a directory, all files need to be in one subtree, which is nothing than a list of nodes (internal/restic/tree.go). This means that the whole list of nodes has to be collected and processed to finish the directory. This prevents parallel processing within a large directory and also leads to large memory consumption as can be easily seen by the memory profiles above.

I would therefore suggest the following: Additional to a possible subtree element which references a tree ID add a subtrees element which allows to reference a list of tree IDs. If the node is a directory and consists of more than a given maximum number (e.g. 100), make more than one subtree (which can be processed independently) and give the list of tree IDs to the node. This requires a change in the node format and thus in the format of tree blobs. However, this can be accomplished in a backward-compatible way (if subtree present, just use the given tree, else use the trees from subtrees and when writing write subtree if there is only one tree in the list).

I will give it a try and start to implement this optimization.

aawsome commented 4 years ago

I started with the implementation in the branch optimize-large-dir It's still WIP and so far only the initial backup and some commands like mount and check work. I'm not yet finished to figure out all changes needed for all commands and to think about tests and changes to documentation.

However, the results look quite promising. I used the test setting provided by @pabs3.

Using the master branch, the initial backup of the one million dir takes 1:24 and uses around 1GB of RAM, see the profile above. The resulting repo has 276 MB which is basically one big tree node in one blob/pack.

Using the optimize-large-dir branch, the initial backup takes 0:51 and uses around 60MB of RAM. The resulting repo has 273 MB and no outstanding blobs/packs.

I keep on working to complete the optimization and make a PR soon. Of course I'm always happy to get feedback on the work.

rawtaz commented 4 years ago

60 MB instead of 1 GB sounds like a huge improvement! :heart_eyes:

aawsome commented 4 years ago

I finished working on all restic commands in the branch optimize-large-dir. It seems that all commands I tested so far use less (some much less) memory and are faster with the test setting of @pabs3.

I start a PR as quite some changes to the current code are needed (basically everywhere where Node.Subtree is used) - I must admit the change is quite bigger than I expected it to be. So in form of a PR I think the changes can be better discussed or improved by co-contributers.

pabs3 commented 4 years ago

Some folks working on bup are doing content based tree splitting, so that adding one file to a large maildir only adds a small amount of metadata rather than another copy of the directory.

https://lwn.net/Articles/824632/ https://github.com/jmberg/bup/commit/44006daca4786abe31e32a969a08778133496663

pabs3 commented 4 years ago

I tested the latest release (0.10.0), which had some memory optimisations and the peak memory usage is now around 2.6GB, so slightly lower but not as much as @aawsome's PR would give.

vmpr commented 4 years ago

are there any news on that? I have a directory with 2,113,804 files and 11,1TB to backup and the oom_reaper kills the process lately every day :( I am using GOGC=1 since months and it worked good so far.
restic 0.9.6 compiled with go1.13.4 on linux/amd64 sadly I can't help to improve the code :(

aawsome commented 4 years ago

are there any news on that? I have a directory with 2,113,804 files and 11,1TB to backup and the oom_reaper kills the process lately every day :( I am using GOGC=1 since months and it worked good so far. restic 0.9.6 compiled with go1.13.4 on linux/amd64 sadly I can't help to improve the code :(

You really have that many files directly in one directory??

If it's a tree of subdirs (as I expect), you should try out 0.10.0 - there have been lot's of improvements with respect to memory consumption. This issue is mainly for dirs containing lots of entries directly (i.e. not via a sub-dir tree).

vmpr commented 4 years ago

yeah, its horrible isn't it? it's the companies central file server and the people work with thousands and thousands of small mini files and thousands of directories, okay I will try the new version because not all of those files are in one directory

aawsome commented 4 years ago

I don't want to argument that restic doesn't have a problem here - it definitively uses much more memory than needed and creates huge tree blobs in this case. And yes, there are some strange cases where lots of files directly can occur.

However, for "normal" settings this shouldn't pose too much of a problem. If you look at @pabs3 tests, you can see, that 1M files actually are a problem - but it scales in both directions. So I expect 100.000 entries directly in one dir to only take around 100MB in the memory profile and maybe 260MB real RAM requirements. So that should be already quite practical. 10.000 entries should not pose any problem, and this is already a number where I believe that people would start sorting/grouping/structuring their dir - if this is really a user-managed dir and not something special that leads to this large number.

rawtaz commented 4 years ago

@vmpr As a potential workaround, you can probably split up the backup run into two or three, where the first one takes [A-L]*, the second [M-T]*, and the third the rest of the files. Each run should then handle less files, which might help with your memory consumption problem.

vmpr commented 4 years ago

thanks, guys, I've upgraded to 0.10.0 and at the moment it's running stable since 18h without being killed and only consumes 25% of the 16GB Server-RAM with GOGC=1

brovoca commented 2 years ago

We're trying to use Velero with restic and it's pretty much unusable for us. The pod gets killed as it consumes more than our limit of 6 GiB. How in the world can it use so much memory still?

shtratos commented 2 years ago

Please consider making a heap dump when you see restic at high mem usage and sharing it. Then at least someone can have a shot at debugging this. Right now there's not enough information to work with.

Also having rough understanding of your use case would also help:

how many directories you are trying to back up?
how many files are in them?
what is the max number of files in a single directory?

On Fri, 11 Feb 2022 at 11:24, brovoca @.***> wrote:

[image: image] https://user-images.githubusercontent.com/95472556/153583327-8a0dc245-1439-46e3-8d9a-d982d63bb628.png We're trying to use Velero with restic and it's pretty much unusable for us. The pod gets killed as it consumes more than our limit of 6 GiB. How in the world can it use so much memory still?

— Reply to this email directly, view it on GitHub https://github.com/restic/restic/issues/2446#issuecomment-1036104911, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJM2L3KWWKELJGLFMKPEMDU2TWXPANCNFSM4JCOKO2A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you commented.Message ID: @.***>

brovoca commented 2 years ago

Dear @shtratos,

We're running Restic through Velero. Do you possibly know how we should go about creating a heap dump through a Kubernetes pod?

The total amount of directories is somewhere around 97k and 2,315,795 (2.3 million) files total. The biggest directory in terms of file count has 2484 files (non-recursive).

Kind regards, Emil

pabs3 commented 2 years ago

That is a very small amount of files per directory, I expect your particular memory issues are off-topic on this issue and you need to file a new bug about your memory issues.

-- bye, pabs

https://bonedaddy.net/pabs3/

mirekphd commented 2 years ago

In my case: 500k files, 1.4 TB in total. Peak memory usage: 220 GiB... Here is the illustration of four overnight attempts (failed, but not due to physical RAM exhaustion - we are struggling with a different restic bug).

And the folder structure, files count and total size occupied all look reasonable... it's just a lot of data. Two of the biggest folders that this user has are:

$ find /home/***/data/***/***/ -maxdepth 1 -type f -print | wc -l
17105
$ du -sch /home/***/data/***/***/
814.1G  total

$ find /home/***/_tmp -maxdepth 1 -type f -print | wc -l
5509
$ du -sch /home/***/_tmp
90.3G   total

I'm using the latest version of the MinIO server (minio/minio:latest) as the backend, but not the latest version of restic (0.12.1 instead of 0.13.1). To get the latest restic I will have to change the entire container system used for performing scheduled backups, as only the former, older restic version is available under (stable) Alpine. But given our solutions architect's and Velero's preference for restic I have to give it a try...

Update: upgrading restic to 0.13.1 (compiled from source for Alpine 3.15) did not help (either reduce memory usage - still 210 GiB - or fix the blocking issue with "client.PutObject: We encountered an internal error, please try again.: cause(open /data/.minio.sys/multipart/<hash>/<another-hash>/fs.json: operation not permitted)").

... which is arguably even worse that your experience below... ;)

We're trying to use Velero with restic and it's pretty much unusable for us. The pod gets killed as it consumes more than our limit of 6 GiB. How in the world can it use so much memory still?

mirekphd commented 2 years ago

It is not a user error. It happens routinely in Kubernetes / Openshift clusters, where each container user folder (so-called PVC) is mapped to a single folder on the host (so-called PV). So even if the container user has a reasonably branchy folder structure in her PVC (with many folders and not too many files in each of them), everything will end up on the host in a single huge folder (PV) with hundreds of thousands of files... even if users have several PVCs.

Note also that this will not die down, as Velero has started linking their docs to this unresolved issue here:) : https://velero.io/docs/main/customize-installation/#customize-resource-requests-and-limits

I have a directory with 2,113,804 files and 11,1TB to backup and the oom_reaper kills the process lately every day :( I am using GOGC=1 since months and it worked good so far. restic 0.9.6 compiled with go1.13.4 on linux/amd64 sadly I can't help to improve the code :(

You really have that many files directly in one directory??

MichaelEischer commented 2 years ago

In my case: 500k files, 1.4 TB in total. Peak memory usage: 220 GiB.

I have never seen memory usage from restic that is even close to that. For reference: for a repo with roughly 35M files, 2.4TB backup data size, 3.6TB repository size, restic 0.12.1 takes 7GB RAM for me. Judging from the stated repository size, I'd expect about 1GB RAM usage, maybe 3GB in the worst case.

How large is the index folder of that repository? How large is the cache used by restic? How large is the repository?

or fix the blocking issue with "client.PutObject: We encountered an internal error, please try again.: cause(open /data/.minio.sys/multipart/<hash>/<another-hash>/fs.json: operation not permitted)").

Based on the information I've seen so far for this error, it is a problem of your MinIO cluster and not of restic.

It is not a user error. It happens routinely in Kubernetes / Openshift clusters, where each container user folder (so-called PVC) is mapped to a single folder on the host (so-called PV)

I'm not sure what you're trying to say. That all of the 500k files above end up in the same folder? Does restic see a single large folder or a folder structure with a reasonable number of subfolder?

MichaelEischer commented 2 years ago

In my case: 500k files, 1.4 TB in total. Peak memory usage: 220 GiB... Here is the illustration of four overnight attempts

@mirekphd After thinking about this a bit more, right now I can only imagine two possible causes of the excessive memory usage you are seeing: either the garbage collection in Go is effectively disabled for some reason or the S3 backend of restic is leaking memory without limit due to some backend problems.

mirekphd commented 2 years ago

@MichaelEischer: you were right with respect to the place where the client.PutObject error originates: multipart uploads do not work with any version of containerized MinIO in our environment (under Openshift), not sure why, and we will have to work around this problem, by avoiding triggering such multipart uploads (e.g. delaying the use of multipart uploads by increasing the max allowed non-multipart file size and skipping or compressing files larger than this limit). It's a bug however that we cannot recover from this error here, while we can in other programs. The other problem is the non-specific error message which does not even include file name despite fatal error being triggered.

Heavy memory usage only occurs when attempting to deal with multipart files, I don't know at which threshold they kick in here, but the max limit allowed in S3 spec would be 5 GiB per file. Memory probably shoots up (regardless of the backup program) when the client makes multiple attempts to upload such a large file, without releasing memory in between those attempts (in this test repo we had 18 files above 5 GiB, up to 15 GiB).

What is problematic here is not releasing memory at all, even after giving up on a particular file (too big to upload to minio without multipart), e.g. when moving over to smaller files, as an alternative backup program does, which creates this ever-increasing memory usage pattern.

MichaelEischer commented 2 years ago

The MinIO S3 library by default uses a multipart size of 16MB. You'll probably have to patch restic to increase the limit:

diff --git a/internal/backend/s3/s3.go b/internal/backend/s3/s3.go
index 1bdf2d795..c124329ae 100644
--- a/internal/backend/s3/s3.go
+++ b/internal/backend/s3/s3.go
@@ -291,6 +291,7 @@ func (be *Backend) Save(ctx context.Context, h restic.Handle, rd restic.RewindRe
        opts.ContentType = "application/octet-stream"
        // the only option with the high-level api is to let the library handle the checksum computation
        opts.SendContentMd5 = true
+       opts.PartSize = 1024 * 1024 * 1024

        debug.Log("PutObject(%v, %v, %v)", be.cfg.Bucket, objName, rd.Length())
        info, err := be.client.PutObject(ctx, be.cfg.Bucket, objName, ioutil.NopCloser(rd), int64(rd.Length()), opts)

It's a bug however that we cannot recover from this error here, while we can in other programs. The other problem is the non-specific error message which does not even include file name despite fatal error being triggered.

There's nothing restic can do to modify the data it is uploading at that point. The prefix of the file name restic tried upload is right before client.PutObject in the error message.

Memory probably shoots up (regardless of the backup program) when the client makes multiple attempts to upload such a large file, without releasing memory in between those attempts (in this test repo we had 18 files above 5 GiB, up to 15 GiB).

Are you referring to files in the repositories data/ folder? Or in the data set that gets backed up?

What is problematic here is not releasing memory at all, even after giving up on a particular file (too big to upload to minio without multipart), e.g. when moving over to smaller files, as an alternative backup program does, which creates this ever-increasing memory usage pattern.

Please open a new issue, fill in the template and include the full output printed by restic. I still feel like I'm missing half of the problem description. Failing to upload a file to the repository is fatal, so there shouldn't be any moving on to further files at that point.

restic / restic