Closed GregBlow closed 4 months ago
always ~2/3 nodes fail
process appears to fail before ceph volumes are generated.
Apparent placement issue:
2024-07-05 15:55:11.631 18 WARNING nova.scheduler.utils [None req-f2f0b68b-bd3b-46f3-927a-7fe62b85cb6d ea36706f7f188e8ed8d1ee96d8b6c26027dc6e102b651dab57498532dde7d642 9168c636eaec419f807c46f1454e87a9 - - default default] Failed to compute_task_build_instances: No valid host was found.
Traceback (most recent call last):
File "/var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_messaging/rpc/server.py", line 244, in inner
return func(*args, **kwargs)
File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/scheduler/manager.py", line 210, in select_destinations
raise exception.NoValidHost(reason="")
nova.exception.NoValidHost: No valid host was found.
: nova.exception_Remote.NoValidHost_Remote: No valid host was found.
2024-07-05 15:55:11.633 18 WARNING nova.scheduler.utils [None req-f2f0b68b-bd3b-46f3-927a-7fe62b85cb6d ea36706f7f188e8ed8d1ee96d8b6c26027dc6e102b651dab57498532dde7d642 9168c636eaec419f807c46f1454e87a9 - - default default] [instance: 8a588032-8cb0-4a8a-b3c6-a54013e13931] Setting instance to ERROR state.: nova.exception_Remote.NoValidHost_Remote: No valid host was found.
2024-07-05 15:55:11.816 19 WARNING nova.scheduler.utils [None req-0f801e14-1074-4934-8de5-94dc59390dc3 ea36706f7f188e8ed8d1ee96d8b6c26027dc6e102b651dab57498532dde7d642 9168c636eaec419f807c46f1454e87a9 - - default default] Failed to compute_task_build_instances: No valid host was found.
Traceback (most recent call last):
File "/var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_messaging/rpc/server.py", line 244, in inner
return func(*args, **kwargs)
File "/var/lib/kolla/venv/lib64/python3.9/site-packages/nova/scheduler/manager.py", line 210, in select_destinations
raise exception.NoValidHost(reason="")
nova.exception.NoValidHost: No valid host was found.
: nova.exception_Remote.NoValidHost_Remote: No valid host was found.
2024-07-05 15:55:11.817 19 WARNING nova.scheduler.utils [None req-0f801e14-1074-4934-8de5-94dc59390dc3 ea36706f7f188e8ed8d1ee96d8b6c26027dc6e102b651dab57498532dde7d642 9168c636eaec419f807c46f1454e87a9 - - default default] [instance: 5d9e4278-622c-49ed-9c92-593626703faf] Setting instance to ERROR state.: nova.exception_Remote.NoValidHost_Remote: No valid host was found.
All attempts to deploy a 14 node qserv cluster fail with nodes failing to become ready e.g.:
(on repeat attempts, different nodes are problematic. No one node always fails.)