Closed jcrossley3 closed 4 weeks ago
@jcrossley3 I'll investigate tomorrow. Thanks for sharing the log. I'll see what changed recently for it to fail now
The cluster was converted into a multi-arch cluster. I am not sure what that means internally. Some nodes are now arm64 nodes. There's some information about multi-arch clusters here: https://docs.openshift.com/container-platform/4.16/post_installation_configuration/configuring-multi-arch-compute-machines/multi-architecture-compute-managing.html
For trustification (v1) I had to amend the Helm charts in a way that they only deploy pods on amd64, as we don't build and publish multi-arch images for v1.
Is it easy enough to revert the multi-arch conversion? If the deploy job then works, at least we could confirm the correlation.
IIRC it was impossible.
Yea, not possible:
Migration from a multi-architecture payload to a single-architecture payload is not supported. Once a cluster has transitioned to using a multi-architecture payload, it can no longer accept a single-architecture update payload.
https://github.com/trustification/trustify/actions/workflows/latest-release.yaml is working again. I made the operator to create arm64 images for the bundle and apparently it solved the problem. This is the PR that made the arm64 possible for the bundles in the operator repo https://github.com/trustification/trustify-operator/pull/36
I'd love to know how you figured that out, but I'm afraid I may lack the expertise required to understand it.
I'd love to know how you figured that out, but I'm afraid I may lack the expertise required to understand it.
I think it was just being lucky. As soon as I started reading this issue today, @ctron already mentioned the "multi-arch" change in the OCP cluster (which to me would be almost impossible to track as I know almost nothing about that OCP instance).
I think you're being humble, but as they say in the casino, luck counts!
I don't know for sure if @ctron 's recent configuration changes to our OCP cluster are the cause of the error, but the
deploy
job in thelatest-release
workflow hasn't succeeded ever since. It fails with the following error:I tried updating to the latest operator-sdk release, but it didn't help.
I'm hoping @carlosthe19916 might offer some guidance on how to diagnose the problem.