Closed slagle closed 1 month ago
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: slagle
The full list of commands accepted by this bot can be found here.
The pull request process is described here
Build failed (check pipeline). Post recheck
(without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.
https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/116fbc3c10fd430e8e1c11a1a1339ff4
:x: openstack-k8s-operators-content-provider FAILURE in 8m 43s :warning: podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider :warning: cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider :warning: openstack-operator-tempest-multinode SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
/hold
I'm now actually not sure this is needed, and I got a nil pointer dereference when trying it. It's not obvious how that happened, so I'd hold on this for the time being.
Build failed (check pipeline). Post recheck
(without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.
https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/363f5e745ae24fe786ae80b703e4769b
:heavy_check_mark: openstack-k8s-operators-content-provider SUCCESS in 1h 58m 58s :heavy_check_mark: podified-multinode-edpm-deployment-crc SUCCESS in 1h 31m 11s :x: cifmw-crc-podified-edpm-baremetal FAILURE in 1h 30m 56s :x: openstack-operator-tempest-multinode FAILURE in 1h 42m 21s
@slagle: The following test failed, say /retest
to rerun all failed tests or /retest-required
to rerun all mandatory failed tests:
Test name | Commit | Details | Required | Rerun command |
---|---|---|---|---|
ci/prow/openstack-operator-build-deploy-kuttl | 1026968006d7283df4209851cdd44453ef48f835 | link | true | /test openstack-operator-build-deploy-kuttl |
Full PR test history. Your PR dashboard.
when we return
requeue
we makeerr==nil
: https://github.com/openstack-k8s-operators/lib-common/blob/main/modules/certmanager/certificate.go#L246-L248the idea was to ignore (and requeue) only the first error, when the certs get created
right, and when the cert is ready, the status condition of the cert cr gets updated, which should result in the owner (one of the openstack-operator controllers) to reconcile
when we return
requeue
we makeerr==nil
: https://github.com/openstack-k8s-operators/lib-common/blob/main/modules/certmanager/certificate.go#L246-L248 the idea was to ignore (and requeue) only the first error, when the certs get createdright, and when the cert is ready, the status condition of the cert cr gets updated, which should result in the owner (one of the openstack-operator controllers) to reconcile
Having said that, I think that what is missing https://github.com/openstack-k8s-operators/openstack-operator/pull/991 . the controller is not watching the certs it is creating and therefore not reconciling if there is an update/change
when we return
requeue
we makeerr==nil
: https://github.com/openstack-k8s-operators/lib-common/blob/main/modules/certmanager/certificate.go#L246-L248 the idea was to ignore (and requeue) only the first error, when the certs get createdright, and when the cert is ready, the status condition of the cert cr gets updated, which should result in the owner (one of the openstack-operator controllers) to reconcile
Having said that, I think that what is missing #991 . the controller is not watching the certs it is creating and therefore not reconciling if there is an update/change
@vakwetu I am not that familiar with the dataplane controllers. with the PR I submitted the owner of the certs, nodeset, will reconcile. But shouldn't the deployment also reconcile as it is doing more follow up steps when calling EnsureTLSCerts https://github.com/openstack-k8s-operators/openstack-operator/blob/main/pkg/dataplane/cert.go#L183-L216 ? or should the dataplanedeployment controller watch the nodesets? or always requeue when there is an error with GetTLSNodeCert at https://github.com/openstack-k8s-operators/openstack-operator/blob/main/pkg/dataplane/cert.go#L180
when we return
requeue
we makeerr==nil
: https://github.com/openstack-k8s-operators/lib-common/blob/main/modules/certmanager/certificate.go#L246-L248 the idea was to ignore (and requeue) only the first error, when the certs get createdright, and when the cert is ready, the status condition of the cert cr gets updated, which should result in the owner (one of the openstack-operator controllers) to reconcile
Ok, that makes sense then. A ctrl.Result with RequeueAfter will only be returned from the lib-common module when err=nil. So this PR shouldn't be needed.
closing per discussion in comments.
The return[1] from EnsureTLSCerts can be a non nil err, and a ctrl.Result with a requeue. That result should be returned to force a requeue, instead of just a ctrl.Result{}.
[1] https://github.com/openstack-k8s-operators/openstack-operator/blob/fa62144840323419fdb74594870f82d77d8b5b78/pkg/dataplane/cert.go#L178-L180
Signed-off-by: James Slagle jslagle@redhat.com