redhat-openstack / openshift-on-openstack

A place to write templates, docs etc. for deploying OpenShift on OpenStack.
Apache License 2.0
136 stars 87 forks source link

Doc: recovery an infra node on RHOSP10 #369

Open ioggstream opened 7 years ago

ioggstream commented 7 years ago

Note: the following procedure just fix the heat part of node recovery. There is another issue with fragments/bastion-node-cleanup.sh, that is run against the new node: I will fix this in another patch / issue

I wish

to document a recovery procedure for RHOSP10 in README.md together with

https://github.com/redhat-openstack/openshift-on-openstack/#removing-or-replacing-specific-nodes

When

A node is removed or is failed at application level, eg: simulate with

openstack server remove shift-infra-1.example.com

The following partially worked for me

Find the nested infra stack as $INFRA_STACK_ID

openstack stack resource list --nested 3 shift | grep infra

$ o stack resource list shift-openshift_infra_nodes-iiwbo4jwsizr
+---------------+--------------------------------------+--------------------------------------------+-----------------+----------------------+
| resource_name | physical_resource_id                 | resource_type                              | resource_status | updated_time         |
+---------------+--------------------------------------+--------------------------------------------+-----------------+----------------------+
| 1             | 7b42c7ba-0c12-4860-bd25-be5b9404cca3 | file:///home/shift/ooo-35/infra.yaml | CHECK_FAILED    | 2017-07-07T09:02:00Z |
| 0             | b4e0a752-3bce-46f5-96c5-7cc9dde21ec8 | file:///home/shift/ooo-35/infra.yaml | UPDATE_COMPLETE | 2017-07-07T09:02:00Z |

Mark the unhealthy node

 openstack  stack resource mark unhealthy \
    shift-openshift_infra_nodes-iiwbo4jwsizr \
   1 \
   "node has a broken disk"

Update the stack

o stack update shift --existing

Check status

 o stack resource list  shift-35v2-openshift_infra_nodes-iiwbo4jwsizr
+---------------+--------------------------------------+--------------------------------------------+-----------------+----------------------+
| resource_name | physical_resource_id                 | resource_type                              | resource_status | updated_time         |
+---------------+--------------------------------------+--------------------------------------------+-----------------+----------------------+
| 1             | 1ea7050f-08fe-4892-9908-36dfa4cffad2 | file:///home/shift/ooo-35/infra.yaml | CREATE_COMPLETE | 2017-07-07T12:51:21Z |
| 0             | b4e0a752-3bce-46f5-96c5-7cc9dde21ec8 | file:///home/shift/ooo-35/infra.yaml | UPDATE_COMPLETE | 2017-07-07T12:51:20Z |
+---------------+--------------------------------------+--------------------------------------------+-----------------+----------------------+