Closed jhrozek closed 3 years ago
@mrogers950 @JAORMX I'd like to get a second opinion on testing. Frankly, I tested by modifying the code to randomly return an error during the update unless a CM called "flag" was created and I saw that profilebundle was being restarted fine. But this won't help QE. Do I have some other options to simulate the errors? I guess writing a test with a mock client could verify that the ParseBundle function returns an error which could be OK for us and I'm fine doing it, but it won't help QE.
uhm... the only way I can think of testing this is by having a container image with invalid content.
Right, let's see if I can craft one. Putting this on hold. /hold
test added, no longer holding
/hold cancel
/retest
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: JAORMX, jhrozek
The full list of commands accepted by this bot can be found here.
The pull request process is described here
We used to be too permisive about failures in the profileparser, which worked fine in the beginning, but doesn't work so well since we started annotating the objects with the bundle SHA digest during an update and delete any objects that were not annotated with that digest value - in case an update operation failed because of a transient error, we would have skipped annotating that object and ultimately treat it as removed and delete it. This was happening especially during upgrades where the API server might be unreachable at some point.
This patch exits the profileparser with a non-zero code on failures as well as propagates the errors from the client operations to the error channels so that they are eventually returned to the binary and used for the error code. This ensures that the profileparser restarts and hopefully goes through after the restart.
Jira: OCPBUGSM-32701