operator-framework / operator-controller

Apache License 2.0
29 stars 47 forks source link

:sparkles: Use error type rather than strings #878

Closed tmshort closed 1 month ago

tmshort commented 1 month ago

Description

Reviewer Checklist

codecov[bot] commented 1 month ago

Codecov Report

Attention: Patch coverage is 56.66667% with 13 lines in your changes missing coverage. Please review.

Project coverage is 77.93%. Comparing base (9f0c6a9) to head (928e1ba).

Files Patch % Lines
internal/controllers/common_controller.go 0.00% 8 Missing :warning:
...nternal/controllers/clusterextension_controller.go 77.27% 5 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #878 +/- ## ========================================== + Coverage 76.21% 77.93% +1.72% ========================================== Files 17 17 Lines 1177 1142 -35 ========================================== - Hits 897 890 -7 + Misses 202 175 -27 + Partials 78 77 -1 ``` | [Flag](https://app.codecov.io/gh/operator-framework/operator-controller/pull/878/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=operator-framework) | Coverage Δ | | |---|---|---| | [e2e](https://app.codecov.io/gh/operator-framework/operator-controller/pull/878/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=operator-framework) | `59.80% <46.66%> (+1.60%)` | :arrow_up: | | [unit](https://app.codecov.io/gh/operator-framework/operator-controller/pull/878/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=operator-framework) | `60.14% <53.33%> (+2.12%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=operator-framework#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

tmshort commented 1 month ago

@m1kola (github doesn't support comment threads), I do believe we need to distinguish between Resolution errors, which likely involve errors in the ClusterExtension resource, vs. other errors that can occur on the cluster. In the first case, the user may need to fix it, and can likely do it themselves. In the other case, it may be a temporal error on the cluster, or something else that the user cannot handle.

openshift-merge-robot commented 1 month ago

PR needs rebase.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
netlify[bot] commented 1 month ago

Deploy Preview for olmv1 ready!

Name Link
Latest commit 928e1baeb73a5948332d1da0a7ee1c5bbaa62099
Latest deploy log https://app.netlify.com/sites/olmv1/deploys/665f2fbc41879700082961cb
Deploy Preview https://deploy-preview-878--olmv1.netlify.app
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

m1kola commented 1 month ago

@m1kola (github doesn't support comment threads), I do believe we need to distinguish between Resolution errors, which likely involve errors in the ClusterExtension resource, vs. other errors that can occur on the cluster. In the first case, the user may need to fix it, and can likely do it themselves. In the other case, it may be a temporal error on the cluster, or something else that the user cannot handle.

@tmshort handleResolutionErrors only handles resolution errors (only errors returned from r.resolve(ctx, *ext)).

How I see it:

In all the these cases we can fail resolution condition and write error into the condition message. I don't think that setting reason to InstallationStatusUnknown aids UX.

There is a chance that I'm missing something (e.g. maybe you expect programmatic clients to take advantage of a distinct combination of reasons and take some action?). Happy to discuss it further here or jump on a quick call if you like.

joelanford commented 1 month ago

@ankitathomas's PR is super relevant to the discussion of temporal/transient vs permanent. https://github.com/operator-framework/operator-controller/pull/842

I think the Progressing condition that we've discussed should make it much easier for us to tell users "we're going to try again" vs "we're not trying again, there is something you need to fix"

tmshort commented 1 month ago

@m1kola

handleResolutionErrors only handles resolution errors (only errors returned from r.resolve(ctx, *ext)).

Non-resolution (e.g. timeout) errors can occur in r.resolve(), and thus any kind of errors are handled by handleResolutionErrors. Would it make more sense to remove the function (since it's a lot smaller now), and put the functionality into the main function?

This is just trying to clean up error processing. It's not meant to clean up the status conditions (but it has to do something with status conditions). It looks as though #842 changed title, so it's doing something different now, issue #880 is meant to clean up the status conditions. The hope is that we can have smaller changes (which is why I'm trying to keep this small).

tmshort commented 1 month ago

This PR is focused on removing the string checks for error types, and all the aggregated errors that were being used - issues noted in the big Helm PR #846. It was not meant to fix all the problems with status conditions.