openshift / openshift-velero-plugin

General Velero plugin for backup and restore of openshift workloads.
Apache License 2.0
48 stars 38 forks source link

Velero cannot parse valid CPU resource requests.... #7

Closed fbladilo closed 4 years ago

fbladilo commented 5 years ago

Seems like velero is having issues while parsing CPU requests , I suspect it expects this to be expressed in millicores instead of just cores/vcores like some applications do: https://github.com/fusor/mig-demo-apps/blob/master/apps/mssql-app/manifest.yaml#L108

OCP seems happy with these requests but velero logs :

time="2019-09-06T02:34:00Z" level=error msg="Using default resource values, couldn't parse resource requirements: couldn't parse CPU request \"\": quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$'." cmd=/velero logSource="pkg/restore/restic_restore_action.go:112" pluginName=velero pod=mssql-example/mssql-deployment-1-j5cld-stage restore=openshift-migration/mssql-app-mig-1567736994-qr48f

This sample was taken migrating mssql-app from OCP 3.7 to 4.2 using latest tags, can be reproduced by attempting to migrate https://github.com/fusor/mig-demo-apps/blob/master/apps/mssql-app/

jwmatthews commented 5 years ago

@sseago is this on our plugin or is this a bug in upstream Velero?

sseago commented 5 years ago

It's upstream. The error message comes from https://github.com/heptio/velero/blob/master/pkg/restore/restic_restore_action.go#L112 I haven't dug into this yet, but I'm guessing the parsing needs to be made more flexible there.

sseago commented 5 years ago

Velero is parsing the pod resources with the appropriate kubernetes calls into "k8s.io/apimachinery/pkg/api/resource". Looking more carefully at the error message above, the value it's trying to parse isn't a bare "2" but an empty string. This is for a stage pod not an application pod, so the resource requirements specified in the application yaml isn't used. Looking at the controller code, we aren't specifying resource limits explicitly for the stage pods. In this particular case, if it's hitting issues with an empty resource limit, then falling back to default is the right thing to do. I just checked velero logs for my current running velero pod and I don't see that message anywhere. I'm wondering whether this is specific to 3.7 -- maybe for OCP3.7, empty resource limits show up with empty strings rather than them not being there. While this seems to be causing velero to create a log message, it appears that Velero is falling back to using defaults here anyway.