openshift / oadp-operator

OADP Operator
Apache License 2.0
78 stars 72 forks source link

Design: oadp operator CRD validation ( improved user experience w/ multiple velero installs) #1134

Open weshayutin opened 1 year ago

weshayutin commented 1 year ago

Customers often have more than one install of OADP on their cluster. They will often have OADP installed for backup and restore and ACM installed for cluster managment. Often these two products are installed at different times and both installs work, however they are not at the expected OADP version in one of the installs.

We need to find an appropriate way to warn the user that there is an operator / oadp crd mismatch on the cluster.

mateusoliveira43 commented 9 months ago

@weshayutin please add a description to the issue or close it

weshayutin commented 9 months ago

ah agree... k updating

kaovilai commented 9 months ago

Thanks for the update!

mateusoliveira43 commented 1 month ago

There is no way to be sure if OADP CRDs mismatch only with cluster information.

OADP could warn users (in logs, events and/or DPA Status) that it might happen based on the following information:

CRD also have generation metadata, but after upgrades, checking if is equals 1, can give false positives as well.

Only way to be sure, is to check if CRD YAML in cluster matches that OADP release CRD YAML, which can not be done always, because some cluster environment are disconnected. But user can dump information and compare, to confirm.

kaovilai commented 1 month ago

Perhaps we can copy expected CRD into Dockerfile as part of build, and operator can compare container image CRD to the in cluster CRD which we can call "oc get" on.

kaovilai commented 1 month ago

It should work IMO as long as operator has permission to call get on CRD.

shubham-pampattiwar commented 1 month ago

I think we can keep this simple, We should check two things:

kaovilai commented 1 month ago

Ok that would avoid spec diff check.. but may not be absolutely correct if CRD was manually installed the right version.. as it sometime is via ACM backup "chart".

shubham-pampattiwar commented 1 month ago

I think there must some label/annotation/identifier that we can use from CRDs in ACM case. If not, then at the very least we could ask ACM folks to add it to their CRD yamls.

kaovilai commented 1 month ago

works

sseago commented 2 weeks ago

Sounds like there are 2 things we may want to do here: 1) Compare CRD version in-cluster to expected CRD version -- this would be an ongoing operator reconcile, so that if someone later installed the wrong CRDs as part of another velero/OADP install, current OADP would warn or error out. 2) Some form of notification at install time that warns a user that other OADP versions are already in the cluster.

weshayutin commented 6 days ago

@shubham-pampattiwar I think this is a great comment and improvment. We'll disucss more, but THANK YOU!! for the creative solution RE: https://github.com/openshift/oadp-operator/issues/1134#issuecomment-2392637925