Open zugao opened 3 years ago
The prebackuppod is created without resource requests or limits (cf. https://github.com/projectsyn/component-backup-k8up/blob/ef37339b1267ed466b7ac90dd1f7fbec57e38c52/lib/backup-k8up.libjsonnet#L197-L228). So, if the object dumper is terminated with exit code 137 (which can be a sign of the process running out of memory, but doesn't have to be, 137 is just the exit code when a process is terminated with SIGKILL
), it's not caused by the requests or limits of the prebackuppod, but for some other reason (e.g. the node running out of memory).
Without having an example to investigate, it's unlikely that we can identify the cause. At the moment, I suspect that the object dumper tool uses too much memory perform the by-namespace splitting of the JSON files for the backup.
Context
Some clusters require more memory to run cluster backups. Here's an example of backup running out of memory (137 error):
E0805 10:04:12.944025 1 pod_exec.go:76] wrestic/k8sExec "msg"="streaming data failed" "error"="command terminated with exit code 137" "namespace"="syn-cluster-backup" "pod"="object-dumper-847b96f5bb-bpc6c"
Alternatives
We could somehow force the backup pod use less memory but having the resources configurable would be a better solution.