openshift / compliance-operator

Operator providing OpenShift cluster compliance checks
Apache License 2.0
110 stars 110 forks source link

Restart profileparser on failures #692

Closed jhrozek closed 3 years ago

jhrozek commented 3 years ago

We used to be too permisive about failures in the profileparser, which worked fine in the beginning, but doesn't work so well since we started annotating the objects with the bundle SHA digest during an update and delete any objects that were not annotated with that digest value - in case an update operation failed because of a transient error, we would have skipped annotating that object and ultimately treat it as removed and delete it. This was happening especially during upgrades where the API server might be unreachable at some point.

This patch exits the profileparser with a non-zero code on failures as well as propagates the errors from the client operations to the error channels so that they are eventually returned to the binary and used for the error code. This ensures that the profileparser restarts and hopefully goes through after the restart.

Jira: OCPBUGSM-32701

jhrozek commented 3 years ago

@mrogers950 @JAORMX I'd like to get a second opinion on testing. Frankly, I tested by modifying the code to randomly return an error during the update unless a CM called "flag" was created and I saw that profilebundle was being restarted fine. But this won't help QE. Do I have some other options to simulate the errors? I guess writing a test with a mock client could verify that the ParseBundle function returns an error which could be OK for us and I'm fine doing it, but it won't help QE.

jhrozek commented 3 years ago

uhm... the only way I can think of testing this is by having a container image with invalid content.

Right, let's see if I can craft one. Putting this on hold. /hold

jhrozek commented 3 years ago

test added, no longer holding

jhrozek commented 3 years ago

/hold cancel

JAORMX commented 3 years ago

/retest

openshift-ci[bot] commented 3 years ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JAORMX, jhrozek

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/openshift/compliance-operator/blob/master/OWNERS)~~ [JAORMX,jhrozek] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment