Closed dprince closed 3 months ago
/hold
Build failed (check pipeline). Post recheck
(without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.
https://review.rdoproject.org/zuul/buildset/8e0ea450916246cd80dd8146d3b20e6d
:heavy_check_mark: openstack-k8s-operators-content-provider SUCCESS in 1h 26m 44s :x: podified-multinode-edpm-deployment-crc FAILURE in 1h 08m 29s :x: cifmw-crc-podified-edpm-baremetal FAILURE in 1h 08m 36s :x: openstack-operator-tempest-multinode FAILURE in 1h 06m 35s
Build failed (check pipeline). Post recheck
(without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.
https://review.rdoproject.org/zuul/buildset/1683d7919bce48a88c1732103ab81f82
:x: openstack-k8s-operators-content-provider FAILURE in 11m 16s :warning: podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider :warning: cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider :warning: openstack-operator-tempest-multinode SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
Build failed (check pipeline). Post recheck
(without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.
https://review.rdoproject.org/zuul/buildset/21aabfd5a3c940a49daa0d26f131d5c3
:x: openstack-k8s-operators-content-provider FAILURE in 10m 52s :warning: podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider :warning: cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider :warning: openstack-operator-tempest-multinode SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
recheck
Build failed (check pipeline). Post recheck
(without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.
https://review.rdoproject.org/zuul/buildset/b8c58d7c412e4c1388e0bb7edcdaf75e
:heavy_check_mark: openstack-k8s-operators-content-provider SUCCESS in 1h 50m 37s :heavy_check_mark: podified-multinode-edpm-deployment-crc SUCCESS in 1h 17m 44s :heavy_check_mark: cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 15m 37s :x: openstack-operator-tempest-multinode RETRY_LIMIT in 24m 06s
recheck
Build failed (check pipeline). Post recheck
(without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.
https://review.rdoproject.org/zuul/buildset/662941fc411642439586560bc4a94706
:heavy_check_mark: openstack-k8s-operators-content-provider SUCCESS in 2h 08m 08s :heavy_check_mark: podified-multinode-edpm-deployment-crc SUCCESS in 1h 23m 34s :heavy_check_mark: cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 24m 21s :x: openstack-operator-tempest-multinode FAILURE in 1h 52m 11s
recheck
Build failed (check pipeline). Post recheck
(without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.
https://review.rdoproject.org/zuul/buildset/a0359f9137894b62b3af503929e991b8
:heavy_check_mark: openstack-k8s-operators-content-provider SUCCESS in 1h 57m 44s :heavy_check_mark: podified-multinode-edpm-deployment-crc SUCCESS in 1h 22m 46s :heavy_check_mark: cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 20m 15s :x: openstack-operator-tempest-multinode FAILURE in 1h 42m 24s
Build failed (check pipeline). Post recheck
(without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.
https://review.rdoproject.org/zuul/buildset/7aa36ac7e4c4487c9334f98a072cbaaf
:heavy_check_mark: openstack-k8s-operators-content-provider SUCCESS in 2h 01m 10s :x: podified-multinode-edpm-deployment-crc FAILURE in 1h 39m 23s :x: cifmw-crc-podified-edpm-baremetal FAILURE in 1h 33m 43s :x: openstack-operator-tempest-multinode FAILURE in 1h 43m 48s
looks good for me, the only thing I am not sure if I understand correctly is how all those changes in various {service}.go files are related to the PR description :)
in order to ensure update order we need to make sure services have been deployed (ready state, and observed generation checks).
recheck
I haven't tried to run it, but looking at the code, the order of updates looks correct.
I couldn't figure out from just reading the code how we guarantee that only ovn-controller
(and not e.g. nova-compute
) is updated on DP, before CP reconcileNormal
is triggered. Will it be managed using a separate Deployment that would run a single ovn
DP service? (I assume updating nova-compute
before CP nova services is a problem.) Perhaps someone could ELI5 to me. (Thank you.)
I haven't tried to run it, but looking at the code, the order of updates looks correct.
I couldn't figure out from just reading the code how we guarantee that only
ovn-controller
(and not e.g.nova-compute
) is updated on DP, before CPreconcileNormal
is triggered. Will it be managed using a separate Deployment that would run a singleovn
DP service? (I assume updatingnova-compute
before CP nova services is a problem.) Perhaps someone could ELI5 to me. (Thank you.)
Also trying to follow this. So we set the condition here: https://github.com/openstack-k8s-operators/openstack-operator/pull/792/files#diff-32500fc60d27debdcd1f64468b83c0a318d79fa55591b2e17a5cb935d9fde650R248
But that just prevents the controller from continuing the update until the condition has been satisfied. It seems that the actual update of OVN on the Dataplane would need to be a manual process as described here:
So this would happen first and then pause until the condition get satisfied. To satisfy the condition, the image deployed would need to match the image the update is expecting as determined by: https://github.com/openstack-k8s-operators/openstack-operator/pull/792/files#diff-308714db370a47145837acae5ff60352d9f352513007441b8c694f2c45c1031dR38-R53
Then the update can continue.
At least that's my 10 minute read on what's happening here. The answer is that the user will create the deployment limited to the OVN service like:
apiVersion: dataplane.openstack.org/v1beta1
kind: OpenStackDataPlaneDeployment
metadata:
name: edpm-deployment-ipam-update
spec:
nodeSets:
- openstack-edpm-ipam
- <nodeSet_name>
- ...
- <nodeSet_name>
servicesOverride:
- ovn
Dataplane updates are manual for GA. There are some Jira's filed (https://issues.redhat.com/browse/OSPRH-6421) which might help us fully streamline the minor update workflow.
I do think we could also validate based on conditions set on the OpenStackVersion resource that when Dataplane resources get executed we are in the correct state. So for example if we need to just execute an OVN playbook we could have a crude validation on that. The adminstrator could always override this with Ansible, but I think a simple check like this could help us further guard the workflow in the future.
Thank you @bshephar @dprince this (rolling in Deployment just for ovn
service) makes sense. It's a bit more leg work for a user but the main point is we have a plan.
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: dprince, stuggi
The full list of commands accepted by this bot can be found here.
The pull request process is described here
Enforce update order for OVN for Ctlplane/EDPM Jira: OSPRH-6732