vmware-tanzu / velero

Backup and migrate Kubernetes applications and their persistent volumes
https://velero.io
Apache License 2.0
8.66k stars 1.39k forks source link

Restic RAM usage goes through the roof #4648

Open brovoca opened 2 years ago

brovoca commented 2 years ago

What steps did you take and what happened:

Upon running Restic backups of our Jenkins instances, the memory usage goes through the roof effectively making Restic useless for us. Unfortunately this severely affects the usability of Velero too.

velero backup create --from-schedule jenkins

Result: 2022-02-15_14-39

The biggest Jenkins instance which is seen in the metrics below has 97k total amount of directories and 2,315,795 (2.3 million) files. The biggest directory in terms of file count contains 2484 files (non-recursive). The total volume size is 14G and we're backing it up to an Azure Storage Account. Velero was re-installed, and the Azure Storage Account was purged before the backup seen in the metrics above.

What did you expect to happen:

To have reasonable memory usage for backup operations.

The following information will help us better understand what's going on:

If you are using velero v1.7.0+:
Please use velero debug --backup <backupname> --restore <restorename> to generate the support bundle, and attach to this issue, more options please refer to velero debug --help

bundle-2022-02-15-14-27-27.tar.gz

Anything else you would like to add:

helmrelease.yaml

schedule.yaml:

apiVersion: velero.io/v1
kind: Schedule
metadata:
  name: jenkins
  namespace: velero
spec:
  schedule: "30 0 * * *"
  template:
    includedNamespaces:
      - jenkins
    includedResources:
      - "*"
    snapshotVolumes: false
    storageLocation: default
    ttl: 720h

Environment:

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

Lyndon-Li commented 2 years ago

This is a case of using Restic to backup file system with large amount of small files, each of which is about 6KB, averagely (14G / 2.3M = 6K). One thing to clarify first is that Velero is leveraging on Restic to backup file system files, and itself doesn't touch anything of the file system. Therefore, the problem lays with Restic. So let's focus on Restic only.

For Restic, there are multiple places that it need to take memory, two of the most prominent ones are:

  1. Restic needs to traverse both the source file system and the previous snapshot, all the data are in memory. Each file system object(file and directory) are represented by data struct like Tree and Node. As far as I could find, there are two copies of Node(one for source, one for previous snapshot) and two copies of Tree(one for scanner and one for archiver). As a result, there are at least 4 copies of the objects, that are 4 * 2.3M = 9.6M objects, each of which takes hundreds to KBs of Bytes in memory.
  2. In order to support Deduplication, Restic needs to keep indexes of sliced blocks, Along with the increase of the data, this could increase dramatically. All these indexes are also in memory.

In the current case, since the source data size is not so large, point 1 maybe the primary cause.

In the Restic repo, we can see many issues and PRs that are talking about huge memory usage.:

  1. The issue that contains most aspects of the memory usage discussions: https://github.com/restic/restic/issues/1988.
  2. One issue also talks about the large dir problem which is very similar to the current one https://github.com/restic/restic/issues/2446.

These issues are not fixed by Restic yet, once they are fixed we will adopt the new version of Restic.

One thing for your notice: in the above issue discussions, there mentioned a quick workaround, that is to change Golang's GC policy. The reason is, Golang doesn't do GC all the time, as a result, even though the memory has been released, it is not immediately turn back to OS by Go runtime. For more information, just check https://github.com/restic/restic/issues/1988.

brovoca commented 2 years ago

Good day @Lyndon-Li,

I've seen these Restic issues and I understand that this is directly related to Restic. The reason I'm raising this here is in hope that alternatives to Restic will be supported by Velero. Anyhow, Restic must somehow implement a per-directory processing rather than loading all at once.

We tried setting env var GOGC=1, but it didn't help at all which I mentioned in the initial post. I am not sure what we can do with our backups and Velero as this memory consumption is just too much.

Kind regards, Emil

Lyndon-Li commented 2 years ago

We are on the way to investigate Kopia as an alternative to Restic. For more information, please follow https://github.com/vmware-tanzu/velero/issues/4538.

gman0 commented 2 years ago

We have experienced OOMs too (and subsequent issues with Restic pod restarts, e.g. https://github.com/vmware-tanzu/velero/issues/4772). If these "problematic" volumes end up being backed up from nodes that are too small to do the job, the backup will never finish successfully. It almost sounds like, as a workaround, there would have to be dedicated large nodes with dummy pods mounting the volumes, and do the copying from there. This is not always feasible however...

DavidSanchezAlvarez commented 2 years ago

Could you please give more details about env var GOGC? I assume that the env var must be defined within the restic pod, but how? I cannot / I don't know how to define that env var within the pod. I'm running restic with Velero in an EKS cluster and struggling with memory consumption as well, while trying to backup Jenkins.

brovoca commented 2 years ago

@DavidSanchezAlvarez I believe I recall correctly when I say that we patched the restic daemonset using a Kustomize patch. We're deploying our stuff in a GitOps manner using FluxCD which makes this easy using their postrenderers for Helm charts.

mirekphd commented 2 years ago

@brovoca: try to beat this :)

brovoca commented 2 years ago

@mirekphd I saw your comments in the morning. Initially I though something was wrong with your grafana dashboard... :astonished:

MichaelEischer commented 2 years ago

For Restic, there are multiple places that it need to take memory, two of the most prominent ones are:

1. Restic needs to traverse both the source file system and the previous snapshot, all the data are in memory. Each file system object(file and directory) are represented by data struct like Tree and Node. As far as I could find, there are two copies of Node(one for source, one for previous snapshot) and two copies of Tree(one for scanner and one for archiver). As a result, there are at least 4 copies of the objects, that are 4 * 2.3M = 9.6M objects, each of which takes hundreds to KBs of Bytes in memory.

restic only keeps Nodes in memory for the folder it is currently processing and for each parent folder. Trees are only created for directories and backup root paths. But these should shouldn't contributed substantially to the memory usage.

The debug bundle shows that restic is effectively called as restic backup . in the volume root (?). How many files does that top folder contain? AFAIR Jenkins only has a few dozen folder/files in the root of its data directory?

2. In order to support Deduplication, Restic needs to keep indexes of sliced blocks, Along with the increase of the data, this could increase dramatically. All these indexes are also in memory.

The size of the index folder of a repository can serve as a rough approximation of how much memory is required for the index. But for the repository here, that should only be around 300MB?

Restic must somehow implement a per-directory processing rather than loading all at once.

That's how it works already. restic traverses the filesystem depth-first, and once it has completed processing a directory, then it will only keep the id of the metadata object representing the directory in memory. I have a backup (not using velero) of 35 Million files, with 2.4 TB total size for which restic can run incremental backups using 7GB of memory. Most of the memory in that backup is required for the in-memory index used for deduplication. So just having a large number of files is not sufficient to cause the memory usage spikes seen here. Which leaves the question what is missing here?

To know whether the problem here is actually restic/restic#2446: how large is the largest file in the data folder of the repository?

brovoca commented 2 years ago

The debug bundle shows that restic is effectively called as restic backup . in the volume root (?). How many files does that top folder contain? AFAIR Jenkins only has a few dozen folder/files in the root of its data directory?

I am not sure of how Velero integrates with restic, but yes, the root of the JENKINS_HOME volume contains no more than 50 files and directories.

To know whether the problem here is actually https://github.com/restic/restic/issues/2446: how large is the largest file in the data folder of the repository?

Data has changed and retention policies have been set, so it has changed a little bit. However, these are the sizes of the 5 largest files on that same instance: 332M, 332M, 187M, 178M, 144M

Another instance with a whole lot more files has this top 10: 3.0G 2.3G 2.3G 2.2G 2.2G 2.0G 1.6G 929M 780M 780M

Lyndon-Li commented 2 years ago

"I am not sure of how Velero integrates with restic". --- Velero always backs up the entire volume of the PV. This means, Velero calls Restic Backup CLI and pass the root of the volume to Restic

MichaelEischer commented 2 years ago

Data has changed and retention policies have been set, so it has changed a little bit. However, these are the sizes of the 5 largest files on that same instance: 332M, 332M, 187M, 178M, 144M

Another instance with a whole lot more files has this top 10: 3.0G 2.3G 2.3G 2.2G 2.2G 2.0G 1.6G 929M 780M 780M

That looks like restic/restic#2446 would definitely reduce the memory usage spikes. Although the memory usage graph from the issue description is what I'd expect to see in conjunction with a 2GB file, so we're missing at least a factor 6 here. As you mentioned that the largest directory regarding file counts has 2.5k files, a 332MB file would mean that restic ended up with 130kb metadata for each file O.o .

The only situation for which I've seen that much metadata was on macOS which uses very large extended attributes for some folders (https://github.com/restic/restic/issues/3643). Do the files in the volume have extended attributes or ACLs set?

brovoca commented 2 years ago

@MichaelEischer Keep in mind that the files on disk have changed as more aggressive retention policy has been implemented. However, the top 5 large files should still remain the same as I doubt that the job definitions have changed that much.

I doubt that there are any attributes set, but it seems to be tricky to check.

$ lsattr /var/jenkins_home/jobs/Some_Pipelines/jobs/something-else/workspace@script/.git/logs/refs/remotes/origin/some_hub_id
lsattr: Operation not supported While reading flags on /var/jenkins_home/jobs/Some_Pipelines/jobs/something-else/workspace@script/.git/logs/refs/remotes/origin/some_hub_id

$ ls -alh /var/jenkins_home/jobs/Some_Pipelines/jobs/something-else/workspace@script/.git/logs/refs/remotes/origin/some_hub_id
-rw-r--r-- 1 jenkins jenkins 278 Apr  6  2021 /var/jenkins_home/jobs/Some_Pipelines/jobs/something-else/workspace@script/.git/logs/refs/remotes/origin/some_hub_id
MichaelEischer commented 2 years ago

Looks like we'll have to directly look at the metadata produced by restic to find out why the pack files are that large. Unfortunately, that will require a debug build of restic to peek into the pack files: