Closed TsutomuNakamura closed 1 year ago
When implementing the feature, the issue #104 should also be solved.
To understand a structure of Ceph, create many Ceph node and its devices.
When I create a device from dashboard, an error like below was outputted.
2023-04-30 13:54:49.785 1500 WARNING cinder.scheduler.host_manager [req-792adbd9-7909-4dd0-a1fd-d9fcd801e62f 0ff2a751819a46dcab034fe08448cbe8 8ef2a54e82a846bf8b72a394c156a845 - - -] volume service is down. (host: dev-storage08@lvm)
2023-04-30 13:54:49.785 1500 INFO cinder.scheduler.base_filter [req-792adbd9-7909-4dd0-a1fd-d9fcd801e62f 0ff2a751819a46dcab034fe08448cbe8 8ef2a54e82a846bf8b72a394c156a845 - - -] Filtering removed all hosts for the request with volume ID '980ccd41-896b-46df-8832-54c12f42bdac'. Filter results: AvailabilityZoneFilter: (start: 0, end: 0), CapacityFilter: (start: 0, end: 0), CapabilitiesFilter: (start: 0, end: 0)
2023-04-30 13:54:49.785 1500 WARNING cinder.scheduler.filter_scheduler [req-792adbd9-7909-4dd0-a1fd-d9fcd801e62f 0ff2a751819a46dcab034fe08448cbe8 8ef2a54e82a846bf8b72a394c156a845 - - -] No weighed backend found for volume with properties: {'id': '1602e1dd-db89-4cd3-ade6-ce56a74ac772', 'name': '__DEFAULT__', 'description': 'Default Volume Type', 'is_public': True, 'projects': [], 'extra_specs': {}, 'qos_specs_id': None, 'created_at': '2023-04-29T09:22:53.000000', 'updated_at': '2023-04-29T09:22:53.000000', 'deleted_at': None, 'deleted': False}
2023-04-30 13:54:49.785 1500 INFO cinder.message.api [req-792adbd9-7909-4dd0-a1fd-d9fcd801e62f 0ff2a751819a46dcab034fe08448cbe8 8ef2a54e82a846bf8b72a394c156a845 - - -] Creating message record for request_id = req-792adbd9-7909-4dd0-a1fd-d9fcd801e62f
2023-04-30 13:54:49.787 1500 ERROR cinder.scheduler.flows.create_volume [req-792adbd9-7909-4dd0-a1fd-d9fcd801e62f 0ff2a751819a46dcab034fe08448cbe8 8ef2a54e82a846bf8b72a394c156a845 - - -] Failed to run task cinder.scheduler.flows.create_volume.ScheduleCreateVolumeTask;volume:create: No valid backend was found. No weighed backends available: cinder.exception.NoValidBackend: No valid backend was found. No weighed backends available
Before the error appears, error like below has happnd on the dashboard.
Error: Unable to retrieve limits information. [Details](http://dev-controller01/horizon/project/#message_details)
Expecting value: line 1 column 1 (char 0)
It will cause when a setting below has set in /etc/cinder/cinder.conf
in controller(cinder) node.
# From
enabled_backends = lvm
# To
enabled_backends = ceph
# chown root:cinder /etc/cinder/cinder.conf
And commented out rbd_cluster_name
and rbd_ceph_conf
in /etc/cinder/cinder.conf
.
Cinder will use a cluster named ceph
as default when these configurations are commented out.
#rbd_cluster_name = jp-east
#rbd_ceph_conf = /etc/ceph/jp-east.conf
Other errors has occurred when opening a page to create new volumes on Horizon.
Error: Unable to retrieve shared images. [Details](http://dev-controller01/horizon/project/volumes/#message_details)
Error finding address for http://dev-controller01:9292/v2/images?visibility=shared&status=active&limit=1000&sort_key=created_at&sort_dir=desc: HTTPConnectionPool(host='dev-controller01', port=9292): Max retries exceeded with url: /v2/images?visibility=shared&status=active&limit=1000&sort_key=created_at&sort_dir=desc (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ffa12bb5de0>: Failed to establish a new connection: [Errno 111] Connection refused'))
Error: Unable to retrieve community images. [Details](http://dev-controller01/horizon/project/volumes/#message_details)
Error finding address for http://dev-controller01:9292/v2/images?visibility=community&status=active&limit=1000&sort_key=created_at&sort_dir=desc: HTTPConnectionPool(host='dev-controller01', port=9292): Max retries exceeded with url: /v2/images?visibility=community&status=active&limit=1000&sort_key=created_at&sort_dir=desc (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ffa12bb4c40>: Failed to establish a new connection: [Errno 111] Connection refused'))
Error: Unable to retrieve images for the current project. [Details](http://dev-controller01/horizon/project/volumes/#message_details)
Error finding address for http://dev-controller01:9292/v2/images?status=active&owner=8ef2a54e82a846bf8b72a394c156a845&limit=1000&sort_key=created_at&sort_dir=desc: HTTPConnectionPool(host='dev-controller01', port=9292): Max retries exceeded with url: /v2/images?status=active&owner=8ef2a54e82a846bf8b72a394c156a845&limit=1000&sort_key=created_at&sort_dir=desc (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ffa12bb43a0>: Failed to establish a new connection: [Errno 111] Connection refused'))
Error: Unable to retrieve public images. [Details](http://dev-controller01/horizon/project/volumes/#message_details)
Error finding address for http://dev-controller01:9292/v2/images?status=active&visibility=public&limit=1000&sort_key=created_at&sort_dir=desc: HTTPConnectionPool(host='dev-controller01', port=9292): Max retries exceeded with url: /v2/images?status=active&visibility=public&limit=1000&sort_key=created_at&sort_dir=desc (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ffa12bb5cf0>: Failed to establish a new connection: [Errno 111] Connection refused'))
# chown root:glance /etc/glance/glance-api.conf
Another error has occurred when creating a new volume.
schedule allocate volume:Could not find any available weighted backend.
root@dev-controller01:~# openstack volume service list
+------------------+-------------------+------+---------+-------+----------------------------+
| Binary | Host | Zone | Status | State | Updated At |
+------------------+-------------------+------+---------+-------+----------------------------+
| cinder-scheduler | dev-controller01 | nova | enabled | up | 2023-05-04T04:10:00.000000 |
| cinder-volume | dev-storage03@lvm | nova | enabled | down | 2023-05-04T03:56:57.000000 |
| cinder-volume | dev-storage02@lvm | nova | enabled | down | 2023-05-04T03:56:56.000000 |
| cinder-volume | dev-storage01@lvm | nova | enabled | down | 2023-05-04T03:56:57.000000 |
| cinder-volume | dev-storage04@lvm | nova | enabled | down | 2023-05-04T03:56:59.000000 |
| cinder-volume | dev-storage06@lvm | nova | enabled | down | 2023-05-04T03:57:00.000000 |
| cinder-volume | dev-storage07@lvm | nova | enabled | down | 2023-05-04T03:57:00.000000 |
| cinder-volume | dev-storage05@lvm | nova | enabled | down | 2023-05-04T03:56:59.000000 |
| cinder-volume | dev-storage08@lvm | nova | enabled | down | 2023-05-04T03:57:00.000000 |
+------------------+-------------------+------+---------+-------+----------------------------+
* Updated at 2023/05/04 15:10
It has been solved when I configure `/etc/cinder/cinder.conf` on each storage(cinder) nodes like below.
-volume_group = cinder-volumes +#volume_group = cinder-volumes
...
-enabled_backends = lvm +#enabled_backends = lvm +enabled_backends = ceph +glance_api_version = 2
...
+[ceph] +volume_driver = cinder.volume.drivers.rbd.RBDDriver + +# Set a name of pool as "rbd". If you want to specify it to store data, you should specify like "rbd_pool". +rbd_pool = volumes + +# Specify user-name and password +rbd_user = cinder +rbd_secret_uuid = 3753f63d-338b-4f3d-b54e-a9117e7d9990 + +rbd_flatten_volume_from_snapshot = false +rbd_max_clone_depth = 5 +rbd_store_chunk_size = 4 +rados_connect_timeout = -1 + + +# Specify a driver ceph for backup_driver +backup_driver = cinder.backup.drivers.ceph +# Specify a location of file of backup_ceph_conf. You can specify it another file of ceph. +# You can specify another name of cluster by specifying another configuration for example. +backup_ceph_conf = /etc/ceph/ceph.conf +# A pool for backup_ceph +backup_ceph_pool = backups +backup_ceph_user = cinder-backup +# Specify configurations below additionally. +backup_ceph_chunk_size = 134217728 +backup_ceph_stripe_unit = 0 +backup_ceph_stripe_count = 0 +restore_discard_excess_bytes = true
An error will be occurred when creating a new image.
command
root@dev-controller01:~# openstack image create --disk-format qcow2 --container-format bare --public --file ./jammy-server-cloudimg-amd64.img "Ubuntu"
HttpException: 500: Server Error for url: http://dev-controller01:9292/v2/images/7f3281ef-d866-4b84-adeb-2d5881a1fe6d/file, Internal Server Error
Logs in /var/log/glance/glance-api.log
2023-05-04 06:07:27.778 1563 INFO eventlet.wsgi.server [-] 172.22.0.1 - - [04/May/2023 06:07:27] "GET / HTTP/1.1" 300 1271 0.000463
2023-05-04 06:07:27.876 1563 INFO eventlet.wsgi.server [req-3ae1bce7-1c12-455f-b6e0-78a9d248d290 0ff2a751819a46dcab034fe08448cbe8 8ef2a54e82a846bf8b72a394c156a845 - default default] 172.22.0.1 - - [04/May/2023 06:07:27] "POST /v2/images HTTP/1.1" 201 1107 0.095773
2023-05-04 06:07:27.880 1563 INFO glance.api.v2.image_data [req-32b7bd60-7e6d-45fd-9307-0363e0b12fe1 0ff2a751819a46dcab034fe08448cbe8 8ef2a54e82a846bf8b72a394c156a845 - default default] Unable to create trust: no such option collect_timing in group [keystone_authtoken] Use the existing user token.
2023-05-04 06:07:27.889 1563 ERROR glance.api.v2.image_data [req-32b7bd60-7e6d-45fd-9307-0363e0b12fe1 0ff2a751819a46dcab034fe08448cbe8 8ef2a54e82a846bf8b72a394c156a845 - default default] Failed to upload image data due to internal error: rados.ObjectNotFound: [errno 2] RADOS object not found (error calling conf_read_file)
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi [req-32b7bd60-7e6d-45fd-9307-0363e0b12fe1 0ff2a751819a46dcab034fe08448cbe8 8ef2a54e82a846bf8b72a394c156a845 - default default] Caught error: [errno 2] RADOS object not found (error calling conf_read_file): rados.ObjectNotFound: [errno 2] RADOS object not found (error calling conf_read_file)
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi Traceback (most recent call last):
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi File "/usr/lib/python3/dist-packages/glance/common/wsgi.py", line 1331, in __call__
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi action_result = self.dispatch(self.controller, action,
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi File "/usr/lib/python3/dist-packages/glance/common/wsgi.py", line 1370, in dispatch
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi return method(*args, **kwargs)
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi File "/usr/lib/python3/dist-packages/glance/common/utils.py", line 414, in wrapped
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi return func(self, req, *args, **kwargs)
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi File "/usr/lib/python3/dist-packages/glance/api/v2/image_data.py", line 300, in upload
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi with excutils.save_and_reraise_exception():
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 227, in __exit__
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi self.force_reraise()
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi raise self.value
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi File "/usr/lib/python3/dist-packages/glance/api/v2/image_data.py", line 163, in upload
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi image.set_data(data, size, backend=backend)
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi File "/usr/lib/python3/dist-packages/glance/notifier.py", line 492, in set_data
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi with excutils.save_and_reraise_exception():
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 227, in __exit__
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi self.force_reraise()
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi raise self.value
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi File "/usr/lib/python3/dist-packages/glance/notifier.py", line 443, in set_data
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi self.repo.set_data(data, size, backend=backend,
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi File "/usr/lib/python3/dist-packages/glance/quota/__init__.py", line 322, in set_data
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi self.image.set_data(data, size=size, backend=backend,
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi File "/usr/lib/python3/dist-packages/glance/location.py", line 585, in set_data
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi self._upload_to_store(data, verifier, backend, size)
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi File "/usr/lib/python3/dist-packages/glance/location.py", line 491, in _upload_to_store
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi loc_meta) = self.store_api.add_to_backend_with_multihash(
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi File "/usr/lib/python3/dist-packages/glance_store/backend.py", line 490, in add_to_backend_with_multihash
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi return store_add_to_backend_with_multihash(
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi File "/usr/lib/python3/dist-packages/glance_store/backend.py", line 467, in store_add_to_backend_with_multihash
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi (location, size, checksum, multihash, metadata) = store.add(
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi File "/usr/lib/python3/dist-packages/glance_store/driver.py", line 279, in add_adapter
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi metadata_dict) = store_add_fun(*args, **kwargs)
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi File "/usr/lib/python3/dist-packages/glance_store/capabilities.py", line 176, in op_checker
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi return store_op_fun(store, *args, **kwargs)
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi File "/usr/lib/python3/dist-packages/glance_store/_drivers/rbd.py", line 552, in add
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi with self.get_connection(conffile=self.conf_file,
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi File "/usr/lib/python3.10/contextlib.py", line 135, in __enter__
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi return next(self.gen)
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi File "/usr/lib/python3/dist-packages/glance_store/_drivers/rbd.py", line 288, in get_connection
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi client = rados.Rados(conffile=conffile, rados_id=rados_id)
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi File "rados.pyx", line 388, in rados.Rados.__init__
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi File "rados.pyx", line 449, in rados.Rados.__setup
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi File "rados.pyx", line 530, in rados.Rados.conf_read_file
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi rados.ObjectNotFound: [errno 2] RADOS object not found (error calling conf_read_file)
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi
2023-05-04 06:07:28.170 1563 INFO eventlet.wsgi.server [req-32b7bd60-7e6d-45fd-9307-0363e0b12fe1 0ff2a751819a46dcab034fe08448cbe8 8ef2a54e82a846bf8b72a394c156a845 - default default] 172.22.0.1 - - [04/May/2023 06:07:28] "PUT /v2/images/7f3281ef-d866-4b84-adeb-2d5881a1fe6d/file HTTP/1.1" 500 341 0.293327
2023-05-04 06:07:28.176 1563 WARNING glance.api.v2.images [req-cda03259-947d-443d-8a65-eca0cdda95f7 0ff2a751819a46dcab034fe08448cbe8 8ef2a54e82a846bf8b72a394c156a845 - default default] After upload to backend, deletion of staged image data has failed because it cannot be found at /tmp/staging//7f3281ef-d866-4b84-adeb-2d5881a1fe6d
2023-05-04 06:07:28.199 1563 INFO eventlet.wsgi.server [req-cda03259-947d-443d-8a65-eca0cdda95f7 0ff2a751819a46dcab034fe08448cbe8 8ef2a54e82a846bf8b72a394c156a845 - default default] 172.22.0.1 - - [04/May/2023 06:07:28] "DELETE /v2/images/7f3281ef-d866-4b84-adeb-2d5881a1fe6d HTTP/1.1" 204 213 0.027312
2023-05-04 06:19:55.661 1564 INFO eventlet.wsgi.server [req-9b367d83-1241-48e5-b758-6f8c14f75094 0ff2a751819a46dcab034fe08448cbe8 8ef2a54e82a846bf8b72a394c156a845 - default default] 172.22.0.1 - - [04/May/2023 06:19:55] "GET /v2/schemas/image HTTP/1.1" 200 6292 0.001358
2023-05-04 06:19:55.700 1564 INFO eventlet.wsgi.server [req-8d7f88c0-80fa-44e9-a245-154c01fd2898 0ff2a751819a46dcab034fe08448cbe8 8ef2a54e82a846bf8b72a394c156a845 - default default] 172.22.0.1 - - [04/May/2023 06:19:55] "GET /v2/images?limit=1000&sort_key=created_at&sort_dir=desc HTTP/1.1" 200 313 0.004119
2023/05/04 16:05
It has solved by creating /etc/ceph/ceph.conf
on controller(Glance) node.
/etc/ceph/ceph.conf
[global]
# specify cluster network for monitoring
cluster network = 172.22.0.0/16
# specify public network
public network = 172.22.0.0/16
fsid = 3753f63d-338b-4f3d-b54e-a9117e7d9990
mon host = 172.22.1.101
mon initial members = dev-storage01 osd pool default crush rule = -1
[mon.dev-storage01]
host = dev-storage01
mon addr = 172.22.1.101
mon allow pool delete = true
The error can be seen on Horizon when creating an instance.
Message Build of instance 66306274-cad5-471e-9b78-9d67bb200857 aborted: [errno 95] error connecting to the cluster
Code 500
Details
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 2748, in _build_resources yield resources
File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 2508, in _build_and_run_instance self.driver.spawn(context, instance, image_meta,
File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 4306, in spawn created_instance_dir, created_disks = self._create_image(
File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 4701, in _create_image created_disks = self._create_and_inject_local_root(
File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 4796, in _create_and_inject_local_root created_disks = not backend.exists()
File "/usr/lib/python3/dist-packages/nova/virt/libvirt/imagebackend.py", line 909, in exists return self.driver.exists(self.rbd_name)
File "/usr/lib/python3/dist-packages/nova/storage/rbd_utils.py", line 320, in exists with RBDVolumeProxy(
self, name, File "/usr/lib/python3/dist-packages/nova/storage/rbd_utils.py", line 73, in __init__ client, ioctx = driver._connect_to_rados(pool)
File "/usr/lib/python3/dist-packages/nova/storage/rbd_utils.py", line 162, in _connect_to_rados client.connect(timeout=self.rbd_connect_timeout)
File "rados.pyx", line 680, in rados.Rados.connect rados.OSError: [errno 95] error connecting to the cluster During handling of the above exception, another exception occurred: Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 2765, in _build_resources self._shutdown_instance(context, instance,
File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 3016, in _shutdown_instance with excutils.save_and_reraise_exception():
File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 227, in __exit__ self.force_reraise()
File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise raise self.value
File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 3007, in _shutdown_instance self.driver.destroy(context, instance, network_info,
File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 1519, in destroy self.cleanup(context, instance, network_info, block_device_info,
File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 1589, in cleanup return self._cleanup(
File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 1662, in _cleanup self._cleanup_rbd(instance)
File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 1739, in _cleanup_rbd rbd_utils.RBDDriver().cleanup_volumes(filter_fn)
File "/usr/lib/python3/dist-packages/nova/storage/rbd_utils.py", line 414, in cleanup_volumes with RADOSClient(self, self.pool) as client: File "/usr/lib/python3/dist-packages/nova/storage/rbd_utils.py", line 109, in __init__ self.cluster, self.ioctx = driver._connect_to_rados(pool)
File "/usr/lib/python3/dist-packages/nova/storage/rbd_utils.py", line 162, in _connect_to_rados client.connect(timeout=self.rbd_connect_timeout)
File "rados.pyx", line 680, in rados.Rados.connect rados.OSError: [errno 95] error connecting to the cluster During handling of the above exception, another exception occurred:Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 2331, in _do_build_and_run_instance self._build_and_run_instance(context, instance, image,
File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 2536, in _build_and_run_instance with excutils.save_and_reraise_exception():
File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 227, in __exit__ self.force_reraise()
File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise raise self.value
File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 2491, in _build_and_run_instance with self._build_resources(context, instance,
File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__ self.gen.throw(typ, value, traceback)
File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 2773, in _build_resources raise exception.BuildAbortException( nova.exception.BuildAbortException: Build of instance 66306274-cad5-471e-9b78-9d67bb200857 aborted: [errno 95] error connecting to the cluster
Error logs in /var/log/ceph/qemu-guest-xxxx.log
on each compute nodes output logs like below.
2023-05-04T11:39:04.081+0000 7f6f9dc7e640 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.cinder.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory
2023-05-04T11:39:04.081+0000 7f6f9dc7e640 -1 AuthRegistry(0x7f6f98064228) no keyring found at /etc/ceph/ceph.client.cinder.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin, disabling cephx
2023-05-04T11:39:04.081+0000 7f6f9dc7e640 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.cinder.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory
2023-05-04T11:39:04.081+0000 7f6f9dc7e640 -1 AuthRegistry(0x7f6f9dc7cfb0) no keyring found at /etc/ceph/ceph.client.cinder.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin, disabling cephx
2023-05-04T11:39:04.081+0000 7f6f977fe640 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1]
2023-05-04T11:39:04.081+0000 7f6f9dc7e640 -1 monclient: authenticate NOTE: no keyring found; disabled cephx authentication
The solution is that copying /etc/ceph/ceph.client.cinder.keyring
on dev-controller01
or dev-storageXX
to each compute nodes.
/etc/ceph/ceph.client.cinder.keyring
After applied ceph.client.cinder.keyring
, another error has occurred.
root@dev-compute01:/var/log/ceph# cat qemu-guest-5712.log
2023-05-04T11:52:31.833+0000 7faa298b1640 -1 asok(0x7faa24000ba0) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/guests/ceph-client.cinder.5712.140368725164832.asok': (13) Permission denied
A solution is that like below.
mkdir -p /var/run/ceph/guests/
chown libvirt-qemu:libvirt /var/run/ceph/guests
Are there any changes about declaring OVN settings.
# grep 6641 . -r --color
./neutron/plugins/ml2/ml2_conf.ini:ovn_nb_connection = tcp:0.0.0.0:6641
./neutron/ovn.ini:#ovn_nb_connection = tcp:127.0.0.1:6641
Should features of Cinder and Ceph be in same host?