vmware-tanzu / velero

Backup and migrate Kubernetes applications and their persistent volumes
https://velero.io
Apache License 2.0
8.6k stars 1.39k forks source link

Velero Controller : Pod Evicted #7177

Open piedai opened 9 months ago

piedai commented 9 months ago

I'm facing regularly an issue on Velero Controller that's Evicted. The cause is that the node was low on resource: ephemeral-storage. After looking on it, the controller use a empty-dir volume mount on /scratch

I use the Helm Chart for deploy Velero on the cluster. The scratch volume is hardcoded in the deployment line 253 https://github.com/vmware-tanzu/helm-charts/blob/main/charts/velero/templates/deployment.yaml

When i describe the pod : The node was low on resource: ephemeral-storage. Threshold quantity: 1072693263, available: 916616Ki. Container velero was using 17248Ki, request is 0, has larger consumption of ephemeral-storage.

Is it possible to mount this volume on a PVC or i need to increase the node partition? Except this, everything works well !

Velero version : velero/velero:v1.10.0 Helm Chart version : 5.0.2 Kubernetes version : 1.27.7 OS-Release : Ubuntu 20.04

I'm surprise, i didn't found any issue about it ! Thanks for any feedback!

blackpiglet commented 9 months ago

Container velero was using 17248Ki

What's the capacity of the node's ephemeral storage size? Using more than 10Mi storage does not seem that much for backup software.

danfengliu commented 9 months ago

Could you try the solution in comment, which is adding requests and limits for ephemeral-storage to the Velero deployment and it's described in Setting requests and limits for local ephemeral storage, if that works, we might need to find a way to add these setting for Velero installation.

Adding PVC is not a best practice apparently, it's better to find the root cause or follow the best practice for Resource Management for Pods and Containers if there're enough resource in your hand.

Is there any other pods that was filling the disk and also got evicted?

reasonerjt commented 9 months ago

Given storage is relatively cheap I don't think consuming 1GB is very large.

We may consider to write some faq in the doc to help users workaround when such an issue happens. Alternatively, we can adjust requested resource in the pod to clarify the expectation of the consumption.

piedai commented 9 months ago

Thanks for your feeback, i tried to add the request and limit on the ephemeral storage but didn't solved the problem. I added a request to 2Go and limit up to 4 Go

Now i've the below output : Pod ephemeral local storage usage exceeds the total limit of containers 4Gi after 24 hours.

What's the data inside the emptyDir ?

blackpiglet commented 9 months ago

I haven't tested Velero with the storage usage in the scratch directory. Cannot give a detailed resource usage guideline for different use scenarios.

                                {
                                    Name:  "VELERO_SCRATCH_DIR",
                                    Value: "/scratch",
                                },

The directory is used as a cache for the integrated Restic process. If you use the Restic uploader to backup data in the volumes, the Restic process may consume considerable disk resources, but it's similar to the CPU and memory resources, it depends on the data in the volumes.