Closed jamepark4 closed 1 year ago
Which environment do you use for this? How does your OpenstackControlPlane CR look like?
I tried to reproduce it but failed. What I did:
Then I created a falvor with 10 vcpus that does not fit to any of the computes as each has 2 vcpus only. Then I tried to boot a VM with that flavor. The boot failed with NoValidHost as expected. I don't see any db errors in the conductor logs and I see the instance stored in the cell0 DB properly.
From your error message nova_cell0_cell0
seems wrong.
What is the output of the command in your env?
oc rsh nova-cell0-conductor-0 nova-manage cell_v2 list_cells
In mine:
Modules with known eventlet monkey patching issues were imported prior to eventlet monkey patching: urllib3. This warning can usually be ignored if the caller is only importing and not executing nova code.
+-------+--------------------------------------+----------------------------------------------------------------------------------+------------------------------------------------------------+----------+
| Name | UUID | Transport URL | Database Connection | Disabled |
+-------+--------------------------------------+----------------------------------------------------------------------------------+------------------------------------------------------------+----------+
| cell0 | 00000000-0000-0000-0000-000000000000 | rabbit: | mysql+pymysql://nova_cell0:****@openstack/nova_cell0 | False |
| cell1 | fe34f679-292c-4460-9de7-6d06d9a57fca | rabbit://default_user_wVHi3_Bu6QYOIVso2pB:****@rabbitmq-cell1.openstack.svc:5672 | mysql+pymysql://nova_cell1:****@openstack-cell1/nova_cell1 | False |
+-------+--------------------------------------+----------------------------------------------------------------------------------+------------------------------------------------------------+----------+
I was using defaults with the ci_framework and directly into tempest with default concurrency. The defaults for the computes are well below what is acceptable for running tempest. I've redeployed with two computes that fit the tempest recommendations and while I'm still hitting some failure to schedule I am no longer seeing the database error with this environment. I'll let you know if I can recreate the failure with this current environment or use the approach you are using when deploying. Below is the cell_v2 list details of the environment.
[stack@sriov01 nova_logs]$ oc rsh nova-cell0-conductor-0 nova-manage cell_v2 list_cells
Modules with known eventlet monkey patching issues were imported prior to eventlet monkey patching: urllib3. This warning can usually be ignored if the caller is only importing and not executing nova code.
+-------+--------------------------------------+----------------------------------------------------------------------------------+------------------------------------------------------------+----------+
| Name | UUID | Transport URL | Database Connection | Disabled |
+-------+--------------------------------------+----------------------------------------------------------------------------------+------------------------------------------------------------+----------+
| cell0 | 00000000-0000-0000-0000-000000000000 | rabbit: | mysql+pymysql://nova_cell0:****@openstack/nova_cell0 | False |
| cell1 | de13b4c2-ddc9-45af-8e58-f7f232172b17 | rabbit://default_user_tbCFe7oorPMk_iouX5j:****@rabbitmq-cell1.openstack.svc:5672 | mysql+pymysql://nova_cell1:****@openstack-cell1/nova_cell1 | False |
+-------+--------------------------------------+----------------------------------------------------------------------------------+------------------------------------------------------------+----------+
[stack@sriov01 nova_logs]$
The cell mapping you copied above is correct, there is no nova_cell0_cell0
mentioned there.
If you still have this issue or if you see it again it would be nice to look around a in the env. I assume it is a wrong database connection config in the nova-cell-conductor statefulset as we ruled out the cell mapping above. But it is really strange that the job created the mapping had a good config while the conductor doesn't have a good config in the same deployment.
Feel free to repoen it if you see it again.
In the event that a guest fails to schedule to a host and is stored in cell0 the procedure will fail due to the default database username:
2023-07-10 21:01:56.629 1 ERROR nova.context pymysql.err.OperationalError: (1044, "Access denied for user 'nova_cell0'@'%' to database 'nova_cell0_cell0'")
It currently appears that default deployment sets cellDatabaseUser:
Full logs attached below with server creation uuid being: req-68ec3082-2ed4-4519-954f-887a6dfe7b74
guest_fails_to_schedule.log