osism / issues

This repository is used for bug reports that are cross-project or not bound to a specific repository (or to an unknown repository).
https://www.osism.tech
1 stars 1 forks source link

nova_compute does not start after re-deploying the compute node with a new nova datadir configuration #830

Open Nils98Ar opened 8 months ago

Nils98Ar commented 8 months ago

Propably the same as in https://bugs.launchpad.net/kolla-ansible/+bug/2051011?

Replacing the nova docker volume with a bind mount of the local ssd via nova_instance_datadir_volume seems to cause the problem because after that the instance_id file which was present in the docker volume is not created again at the new location and nova_compute refuses to start.

Any idea or maybe a workaround?

2024-01-26 13:15:28.634 7 ERROR oslo_service.service [None req-64a700a1-9477-4241-86dc-16c90baf3a5a - - - - - -] Error starting thread.: nova.exception.InvalidConfiguration: No local node identity found, but this is not our first startup on this host. Refusing to start after potentially having lost that state!
2024-01-26 13:15:28.634 7 ERROR oslo_service.service Traceback (most recent call last):
2024-01-26 13:15:28.634 7 ERROR oslo_service.service   File "/var/lib/kolla/venv/lib/python3.10/site-packages/oslo_service/service.py", line 806, in run_service
2024-01-26 13:15:28.634 7 ERROR oslo_service.service     service.start()
2024-01-26 13:15:28.634 7 ERROR oslo_service.service   File "/var/lib/kolla/venv/lib/python3.10/site-packages/nova/service.py", line 162, in start
2024-01-26 13:15:28.634 7 ERROR oslo_service.service     self.manager.init_host(self.service_ref)
2024-01-26 13:15:28.634 7 ERROR oslo_service.service   File "/var/lib/kolla/venv/lib/python3.10/site-packages/nova/compute/manager.py", line 1578, in init_host
2024-01-26 13:15:28.634 7 ERROR oslo_service.service     self._ensure_existing_node_identity(service_ref)
2024-01-26 13:15:28.634 7 ERROR oslo_service.service   File "/var/lib/kolla/venv/lib/python3.10/site-packages/nova/compute/manager.py", line 1521, in _ensure_existing_node_identity
2024-01-26 13:15:28.634 7 ERROR oslo_service.service     raise exception.InvalidConfiguration(
2024-01-26 13:15:28.634 7 ERROR oslo_service.service nova.exception.InvalidConfiguration: No local node identity found, but this is not our first startup on this host. Refusing to start after potentially having lost that state!
Nils98Ar commented 8 months ago

Maybe creating the file manually inside the nova_compute container could work?

berendt commented 8 months ago

Looks like this issue, yes.

Nils98Ar commented 8 months ago

Creating the file manually on the compute node with the output of openstack hypervisor show <compute01 FQDN>-c id -f value as <ID> seems to work but I am not sure if it is a good idea:

dragon@compute01:~$ docker exec -itu root nova_compute bash -c "echo -n '<ID>' > /var/lib/nova/compute_id; chown nova:nova /var/lib/nova/compute_id;chmod 644 /var/lib/nova/compute_id"
Nils98Ar commented 8 months ago

This does also work if the old docker volume is still there:

dragon@compute01:~$ sudo mv /var/lib/docker/volumes/nova_compute/_data/compute_id <new datadir>/