Closed ciecierski closed 5 months ago
/lgtm
/lgtm
/approve
What is Tide waiting for here in terms of approval? Does it need to be approved by @SeanMooney specifically as the comment above suggests? :)
Ah i guess Mikolaj or i would have to be on the approvers list.
no it just requries approval form anyone in the owners file https://github.com/openstack-k8s-operators/edpm-ansible/blob/main/OWNERS or OWNERS_ALIASES https://github.com/openstack-k8s-operators/edpm-ansible/blob/main/OWNERS_ALIASES
/approve
If there are concerns how this change works for adoption I link pr to dataplane-adoption repo that tests the change in pre-adoption phase. https://github.com/openstack-k8s-operators/data-plane-adoption/pull/459/files
So that pr is not depending on this pr so it's not testing it and it's also failing ci which is not very encouraging
I'll remove the requested change to allow others to approve but I'm still unsure that how reboot are being done are correct
first is there a generic adoption task planned to clean up tripleo config? by default we likely should be avoiding remoood for edpm nodes with nova but I'm not sure its safe to force the reboot for fpis mode by default.
I think we're stopping and removing the services, but we're not presently cleaning up the config? Anyway i think we need some way to check for "is this a node that's already been used in the deployment to host some workload or networker etc." and then block the reboot. I don't have a better idea than what Mikolaj posted. For reference here is the generic cleanup PR:
https://github.com/openstack-k8s-operators/edpm-ansible/pull/649
so this will mean we will never reboot
Isn't that the goal though? I thought the os-reboot role is to apply node config (that needs reboot to apply itself) before the first use of that node. I don't think we wanted to use os-reboot role to reboot nodes that are already being utilized (hosting workloads / networking / storage)?
the os-reboot will be used for kernel updates or other changes that require a host reboot as part of day two maintenance.
it feels wrong to require force to be used in that case auto should work asuimg a prior task has marked the node as needing reboot.
the intended workflow is that you would do a separate deployment including the os-reboot service with --limit to reboot subsets of host that you have drained of workloads in batches.
in that workflow we should not need to update the ansible_vars on the nodeset to force.
if we cant support that usecase we cant do minor updates properly.
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: ciecierski, fao89, jistr, SeanMooney
The full list of commands accepted by this bot can be found here.
The pull request process is described here
i will file a jira bug for the fips issue
So that pr is not depending on this pr so it's not testing it and it's also failing ci which is not very encouraging
Yes, the pr wasn't testing it. I made a patch, but there was missing depends-on. I fixed it and new results are https://github.com/openstack-k8s-operators/data-plane-adoption/pull/459#issuecomment-2133562264 A non-ceph job failed during adoption, it passed pre-adoption validation with reboot.
filed OSPRH-7301
Add edpm_reboot_strategy to reboot role that defines if reboot should be started or not. There are three modes: auto, never and force. The default is auto.
Add also more generic check for existence of openstack configs for already deployed nodes, both old tripleo and new edpm node.