TsutomuNakamura commented 1 year ago

When implementing the feature, the issue #104 should also be solved.

TsutomuNakamura commented 1 year ago

Learn about "cluster", "osd", "pool", "placement group", "mds", "data"
Can multi pools (or placement group or osd) be created in a cluster?

TsutomuNakamura commented 1 year ago

To understand a structure of Ceph, create many Ceph node and its devices.

Ceph nodes: 8
Devices on each node: 4
Size of each device: 8GiB

TsutomuNakamura commented 1 year ago

When I create a device from dashboard, an error like below was outputted.

2023-04-30 13:54:49.785 1500 WARNING cinder.scheduler.host_manager [req-792adbd9-7909-4dd0-a1fd-d9fcd801e62f 0ff2a751819a46dcab034fe08448cbe8 8ef2a54e82a846bf8b72a394c156a845 - - -] volume service is down. (host: dev-storage08@lvm)
2023-04-30 13:54:49.785 1500 INFO cinder.scheduler.base_filter [req-792adbd9-7909-4dd0-a1fd-d9fcd801e62f 0ff2a751819a46dcab034fe08448cbe8 8ef2a54e82a846bf8b72a394c156a845 - - -] Filtering removed all hosts for the request with volume ID '980ccd41-896b-46df-8832-54c12f42bdac'. Filter results: AvailabilityZoneFilter: (start: 0, end: 0), CapacityFilter: (start: 0, end: 0), CapabilitiesFilter: (start: 0, end: 0)
2023-04-30 13:54:49.785 1500 WARNING cinder.scheduler.filter_scheduler [req-792adbd9-7909-4dd0-a1fd-d9fcd801e62f 0ff2a751819a46dcab034fe08448cbe8 8ef2a54e82a846bf8b72a394c156a845 - - -] No weighed backend found for volume with properties: {'id': '1602e1dd-db89-4cd3-ade6-ce56a74ac772', 'name': '__DEFAULT__', 'description': 'Default Volume Type', 'is_public': True, 'projects': [], 'extra_specs': {}, 'qos_specs_id': None, 'created_at': '2023-04-29T09:22:53.000000', 'updated_at': '2023-04-29T09:22:53.000000', 'deleted_at': None, 'deleted': False}
2023-04-30 13:54:49.785 1500 INFO cinder.message.api [req-792adbd9-7909-4dd0-a1fd-d9fcd801e62f 0ff2a751819a46dcab034fe08448cbe8 8ef2a54e82a846bf8b72a394c156a845 - - -] Creating message record for request_id = req-792adbd9-7909-4dd0-a1fd-d9fcd801e62f
2023-04-30 13:54:49.787 1500 ERROR cinder.scheduler.flows.create_volume [req-792adbd9-7909-4dd0-a1fd-d9fcd801e62f 0ff2a751819a46dcab034fe08448cbe8 8ef2a54e82a846bf8b72a394c156a845 - - -] Failed to run task cinder.scheduler.flows.create_volume.ScheduleCreateVolumeTask;volume:create: No valid backend was found. No weighed backends available: cinder.exception.NoValidBackend: No valid backend was found. No weighed backends available

TsutomuNakamura commented 1 year ago

Before the error appears, error like below has happnd on the dashboard.

Error: Unable to retrieve limits information. [Details](http://dev-controller01/horizon/project/#message_details)
Expecting value: line 1 column 1 (char 0)

It will cause when a setting below has set in /etc/cinder/cinder.conf in controller(cinder) node.

# From
enabled_backends = lvm
# To
enabled_backends = ceph

Updated at 2023/05/04 12:35 It has solved by changing owner and group of the file.
```
# chown root:cinder /etc/cinder/cinder.conf
```

And commented out rbd_cluster_name and rbd_ceph_conf in /etc/cinder/cinder.conf. Cinder will use a cluster named ceph as default when these configurations are commented out.

/etc/cinder/cinder.conf (in Cinder node)

#rbd_cluster_name = jp-east
#rbd_ceph_conf = /etc/ceph/jp-east.conf

TsutomuNakamura commented 1 year ago

Other errors has occurred when opening a page to create new volumes on Horizon.

Error: Unable to retrieve shared images. [Details](http://dev-controller01/horizon/project/volumes/#message_details)
Error finding address for http://dev-controller01:9292/v2/images?visibility=shared&status=active&limit=1000&sort_key=created_at&sort_dir=desc: HTTPConnectionPool(host='dev-controller01', port=9292): Max retries exceeded with url: /v2/images?visibility=shared&status=active&limit=1000&sort_key=created_at&sort_dir=desc (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ffa12bb5de0>: Failed to establish a new connection: [Errno 111] Connection refused'))

Error: Unable to retrieve community images. [Details](http://dev-controller01/horizon/project/volumes/#message_details)
Error finding address for http://dev-controller01:9292/v2/images?visibility=community&status=active&limit=1000&sort_key=created_at&sort_dir=desc: HTTPConnectionPool(host='dev-controller01', port=9292): Max retries exceeded with url: /v2/images?visibility=community&status=active&limit=1000&sort_key=created_at&sort_dir=desc (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ffa12bb4c40>: Failed to establish a new connection: [Errno 111] Connection refused'))

Error: Unable to retrieve images for the current project. [Details](http://dev-controller01/horizon/project/volumes/#message_details)
Error finding address for http://dev-controller01:9292/v2/images?status=active&owner=8ef2a54e82a846bf8b72a394c156a845&limit=1000&sort_key=created_at&sort_dir=desc: HTTPConnectionPool(host='dev-controller01', port=9292): Max retries exceeded with url: /v2/images?status=active&owner=8ef2a54e82a846bf8b72a394c156a845&limit=1000&sort_key=created_at&sort_dir=desc (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ffa12bb43a0>: Failed to establish a new connection: [Errno 111] Connection refused'))

Error: Unable to retrieve public images. [Details](http://dev-controller01/horizon/project/volumes/#message_details)
Error finding address for http://dev-controller01:9292/v2/images?status=active&visibility=public&limit=1000&sort_key=created_at&sort_dir=desc: HTTPConnectionPool(host='dev-controller01', port=9292): Max retries exceeded with url: /v2/images?status=active&visibility=public&limit=1000&sort_key=created_at&sort_dir=desc (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ffa12bb5cf0>: Failed to establish a new connection: [Errno 111] Connection refused'))

Updated at 2023/0504 13:00 It has solved by changing owner the file.

# chown root:glance /etc/glance/glance-api.conf

TsutomuNakamura commented 1 year ago

Another error has occurred when creating a new volume.

schedule allocate volume:Could not find any available weighted backend.

Updated at 2023/05/04 13:12 All back-ends for cinder are down.


root@dev-controller01:~# openstack volume service list
+------------------+-------------------+------+---------+-------+----------------------------+
| Binary           | Host              | Zone | Status  | State | Updated At                 |
+------------------+-------------------+------+---------+-------+----------------------------+
| cinder-scheduler | dev-controller01  | nova | enabled | up    | 2023-05-04T04:10:00.000000 |
| cinder-volume    | dev-storage03@lvm | nova | enabled | down  | 2023-05-04T03:56:57.000000 |
| cinder-volume    | dev-storage02@lvm | nova | enabled | down  | 2023-05-04T03:56:56.000000 |
| cinder-volume    | dev-storage01@lvm | nova | enabled | down  | 2023-05-04T03:56:57.000000 |
| cinder-volume    | dev-storage04@lvm | nova | enabled | down  | 2023-05-04T03:56:59.000000 |
| cinder-volume    | dev-storage06@lvm | nova | enabled | down  | 2023-05-04T03:57:00.000000 |
| cinder-volume    | dev-storage07@lvm | nova | enabled | down  | 2023-05-04T03:57:00.000000 |
| cinder-volume    | dev-storage05@lvm | nova | enabled | down  | 2023-05-04T03:56:59.000000 |
| cinder-volume    | dev-storage08@lvm | nova | enabled | down  | 2023-05-04T03:57:00.000000 |
+------------------+-------------------+------+---------+-------+----------------------------+

* Updated at 2023/05/04 15:10
It has been solved when I configure `/etc/cinder/cinder.conf` on each storage(cinder) nodes like below.

-volume_group = cinder-volumes +#volume_group = cinder-volumes

...

-enabled_backends = lvm +#enabled_backends = lvm +enabled_backends = ceph +glance_api_version = 2

...

+[ceph] +volume_driver = cinder.volume.drivers.rbd.RBDDriver + +# Set a name of pool as "rbd". If you want to specify it to store data, you should specify like "rbd_pool". +rbd_pool = volumes + +# Specify user-name and password +rbd_user = cinder +rbd_secret_uuid = 3753f63d-338b-4f3d-b54e-a9117e7d9990 + +rbd_flatten_volume_from_snapshot = false +rbd_max_clone_depth = 5 +rbd_store_chunk_size = 4 +rados_connect_timeout = -1 + + +# Specify a driver ceph for backup_driver +backup_driver = cinder.backup.drivers.ceph +# Specify a location of file of backup_ceph_conf. You can specify it another file of ceph. +# You can specify another name of cluster by specifying another configuration for example. +backup_ceph_conf = /etc/ceph/ceph.conf +# A pool for backup_ceph +backup_ceph_pool = backups +backup_ceph_user = cinder-backup +# Specify configurations below additionally. +backup_ceph_chunk_size = 134217728 +backup_ceph_stripe_unit = 0 +backup_ceph_stripe_count = 0 +restore_discard_excess_bytes = true

TsutomuNakamura commented 1 year ago

An error will be occurred when creating a new image.

command

root@dev-controller01:~# openstack image create --disk-format qcow2     --container-format bare --public     --file ./jammy-server-cloudimg-amd64.img "Ubuntu"
HttpException: 500: Server Error for url: http://dev-controller01:9292/v2/images/7f3281ef-d866-4b84-adeb-2d5881a1fe6d/file, Internal Server Error

Logs in /var/log/glance/glance-api.log

2023-05-04 06:07:27.778 1563 INFO eventlet.wsgi.server [-] 172.22.0.1 - - [04/May/2023 06:07:27] "GET / HTTP/1.1" 300 1271 0.000463
2023-05-04 06:07:27.876 1563 INFO eventlet.wsgi.server [req-3ae1bce7-1c12-455f-b6e0-78a9d248d290 0ff2a751819a46dcab034fe08448cbe8 8ef2a54e82a846bf8b72a394c156a845 - default default] 172.22.0.1 - - [04/May/2023 06:07:27] "POST /v2/images HTTP/1.1" 201 1107 0.095773
2023-05-04 06:07:27.880 1563 INFO glance.api.v2.image_data [req-32b7bd60-7e6d-45fd-9307-0363e0b12fe1 0ff2a751819a46dcab034fe08448cbe8 8ef2a54e82a846bf8b72a394c156a845 - default default] Unable to create trust: no such option collect_timing in group [keystone_authtoken] Use the existing user token.
2023-05-04 06:07:27.889 1563 ERROR glance.api.v2.image_data [req-32b7bd60-7e6d-45fd-9307-0363e0b12fe1 0ff2a751819a46dcab034fe08448cbe8 8ef2a54e82a846bf8b72a394c156a845 - default default] Failed to upload image data due to internal error: rados.ObjectNotFound: [errno 2] RADOS object not found (error calling conf_read_file)
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi [req-32b7bd60-7e6d-45fd-9307-0363e0b12fe1 0ff2a751819a46dcab034fe08448cbe8 8ef2a54e82a846bf8b72a394c156a845 - default default] Caught error: [errno 2] RADOS object not found (error calling conf_read_file): rados.ObjectNotFound: [errno 2] RADOS object not found (error calling conf_read_file)
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi Traceback (most recent call last):
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi   File "/usr/lib/python3/dist-packages/glance/common/wsgi.py", line 1331, in __call__
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi     action_result = self.dispatch(self.controller, action,
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi   File "/usr/lib/python3/dist-packages/glance/common/wsgi.py", line 1370, in dispatch
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi     return method(*args, **kwargs)
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi   File "/usr/lib/python3/dist-packages/glance/common/utils.py", line 414, in wrapped
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi     return func(self, req, *args, **kwargs)
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi   File "/usr/lib/python3/dist-packages/glance/api/v2/image_data.py", line 300, in upload
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi     with excutils.save_and_reraise_exception():
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 227, in __exit__
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi     self.force_reraise()
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi     raise self.value
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi   File "/usr/lib/python3/dist-packages/glance/api/v2/image_data.py", line 163, in upload
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi     image.set_data(data, size, backend=backend)
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi   File "/usr/lib/python3/dist-packages/glance/notifier.py", line 492, in set_data
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi     with excutils.save_and_reraise_exception():
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 227, in __exit__
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi     self.force_reraise()
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi     raise self.value
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi   File "/usr/lib/python3/dist-packages/glance/notifier.py", line 443, in set_data
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi     self.repo.set_data(data, size, backend=backend,
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi   File "/usr/lib/python3/dist-packages/glance/quota/__init__.py", line 322, in set_data
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi     self.image.set_data(data, size=size, backend=backend,
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi   File "/usr/lib/python3/dist-packages/glance/location.py", line 585, in set_data
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi     self._upload_to_store(data, verifier, backend, size)
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi   File "/usr/lib/python3/dist-packages/glance/location.py", line 491, in _upload_to_store
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi     loc_meta) = self.store_api.add_to_backend_with_multihash(
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi   File "/usr/lib/python3/dist-packages/glance_store/backend.py", line 490, in add_to_backend_with_multihash
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi     return store_add_to_backend_with_multihash(
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi   File "/usr/lib/python3/dist-packages/glance_store/backend.py", line 467, in store_add_to_backend_with_multihash
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi     (location, size, checksum, multihash, metadata) = store.add(
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi   File "/usr/lib/python3/dist-packages/glance_store/driver.py", line 279, in add_adapter
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi     metadata_dict) = store_add_fun(*args, **kwargs)
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi   File "/usr/lib/python3/dist-packages/glance_store/capabilities.py", line 176, in op_checker
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi     return store_op_fun(store, *args, **kwargs)
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi   File "/usr/lib/python3/dist-packages/glance_store/_drivers/rbd.py", line 552, in add
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi     with self.get_connection(conffile=self.conf_file,
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi   File "/usr/lib/python3.10/contextlib.py", line 135, in __enter__
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi     return next(self.gen)
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi   File "/usr/lib/python3/dist-packages/glance_store/_drivers/rbd.py", line 288, in get_connection
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi     client = rados.Rados(conffile=conffile, rados_id=rados_id)
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi   File "rados.pyx", line 388, in rados.Rados.__init__
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi   File "rados.pyx", line 449, in rados.Rados.__setup
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi   File "rados.pyx", line 530, in rados.Rados.conf_read_file
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi rados.ObjectNotFound: [errno 2] RADOS object not found (error calling conf_read_file)
2023-05-04 06:07:27.896 1563 ERROR glance.common.wsgi
2023-05-04 06:07:28.170 1563 INFO eventlet.wsgi.server [req-32b7bd60-7e6d-45fd-9307-0363e0b12fe1 0ff2a751819a46dcab034fe08448cbe8 8ef2a54e82a846bf8b72a394c156a845 - default default] 172.22.0.1 - - [04/May/2023 06:07:28] "PUT /v2/images/7f3281ef-d866-4b84-adeb-2d5881a1fe6d/file HTTP/1.1" 500 341 0.293327
2023-05-04 06:07:28.176 1563 WARNING glance.api.v2.images [req-cda03259-947d-443d-8a65-eca0cdda95f7 0ff2a751819a46dcab034fe08448cbe8 8ef2a54e82a846bf8b72a394c156a845 - default default] After upload to backend, deletion of staged image data has failed because it cannot be found at /tmp/staging//7f3281ef-d866-4b84-adeb-2d5881a1fe6d
2023-05-04 06:07:28.199 1563 INFO eventlet.wsgi.server [req-cda03259-947d-443d-8a65-eca0cdda95f7 0ff2a751819a46dcab034fe08448cbe8 8ef2a54e82a846bf8b72a394c156a845 - default default] 172.22.0.1 - - [04/May/2023 06:07:28] "DELETE /v2/images/7f3281ef-d866-4b84-adeb-2d5881a1fe6d HTTP/1.1" 204 213 0.027312
2023-05-04 06:19:55.661 1564 INFO eventlet.wsgi.server [req-9b367d83-1241-48e5-b758-6f8c14f75094 0ff2a751819a46dcab034fe08448cbe8 8ef2a54e82a846bf8b72a394c156a845 - default default] 172.22.0.1 - - [04/May/2023 06:19:55] "GET /v2/schemas/image HTTP/1.1" 200 6292 0.001358
2023-05-04 06:19:55.700 1564 INFO eventlet.wsgi.server [req-8d7f88c0-80fa-44e9-a245-154c01fd2898 0ff2a751819a46dcab034fe08448cbe8 8ef2a54e82a846bf8b72a394c156a845 - default default] 172.22.0.1 - - [04/May/2023 06:19:55] "GET /v2/images?limit=1000&sort_key=created_at&sort_dir=desc HTTP/1.1" 200 313 0.004119

2023/05/04 16:05 It has solved by creating /etc/ceph/ceph.conf on controller(Glance) node.

/etc/ceph/ceph.conf


[global]
# specify cluster network for monitoring
cluster network = 172.22.0.0/16
# specify public network
public network = 172.22.0.0/16

specify UUID genarated above

fsid = 3753f63d-338b-4f3d-b54e-a9117e7d9990

specify IP address of Monitor Daemon

mon host = 172.22.1.101

specify Hostname of Monitor Daemon

mon initial members = dev-storage01 osd pool default crush rule = -1

mon.(Node name)

[mon.dev-storage01]

specify Hostname of Monitor Daemon

host = dev-storage01

specify IP address of Monitor Daemon

mon addr = 172.22.1.101

allow to delete pools

mon allow pool delete = true

TsutomuNakamura commented 1 year ago

An error occurred when creating an instance

The error can be seen on Horizon when creating an instance.

Message Build of instance 66306274-cad5-471e-9b78-9d67bb200857 aborted: [errno 95] error connecting to the cluster
Code 500
Details
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 2748, in _build_resources yield resources
File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 2508, in _build_and_run_instance self.driver.spawn(context, instance, image_meta,
File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 4306, in spawn created_instance_dir, created_disks = self._create_image(
File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 4701, in _create_image created_disks = self._create_and_inject_local_root(
File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 4796, in _create_and_inject_local_root created_disks = not backend.exists()
File "/usr/lib/python3/dist-packages/nova/virt/libvirt/imagebackend.py", line 909, in exists return self.driver.exists(self.rbd_name)
File "/usr/lib/python3/dist-packages/nova/storage/rbd_utils.py", line 320, in exists with RBDVolumeProxy(
self, name, File "/usr/lib/python3/dist-packages/nova/storage/rbd_utils.py", line 73, in __init__ client, ioctx = driver._connect_to_rados(pool)
File "/usr/lib/python3/dist-packages/nova/storage/rbd_utils.py", line 162, in _connect_to_rados client.connect(timeout=self.rbd_connect_timeout)
File "rados.pyx", line 680, in rados.Rados.connect rados.OSError: [errno 95] error connecting to the cluster During handling of the above exception, another exception occurred: Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 2765, in _build_resources self._shutdown_instance(context, instance,
File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 3016, in _shutdown_instance with excutils.save_and_reraise_exception():
File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 227, in __exit__ self.force_reraise()
File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise raise self.value
File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 3007, in _shutdown_instance self.driver.destroy(context, instance, network_info,
File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 1519, in destroy self.cleanup(context, instance, network_info, block_device_info,
File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 1589, in cleanup return self._cleanup(
File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 1662, in _cleanup self._cleanup_rbd(instance)
File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 1739, in _cleanup_rbd rbd_utils.RBDDriver().cleanup_volumes(filter_fn)
File "/usr/lib/python3/dist-packages/nova/storage/rbd_utils.py", line 414, in cleanup_volumes with RADOSClient(self, self.pool) as client: File "/usr/lib/python3/dist-packages/nova/storage/rbd_utils.py", line 109, in __init__ self.cluster, self.ioctx = driver._connect_to_rados(pool)
File "/usr/lib/python3/dist-packages/nova/storage/rbd_utils.py", line 162, in _connect_to_rados client.connect(timeout=self.rbd_connect_timeout)
File "rados.pyx", line 680, in rados.Rados.connect rados.OSError: [errno 95] error connecting to the cluster During handling of the above exception, another exception occurred:Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 2331, in _do_build_and_run_instance self._build_and_run_instance(context, instance, image,
File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 2536, in _build_and_run_instance with excutils.save_and_reraise_exception():
File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 227, in __exit__ self.force_reraise()
File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise raise self.value
File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 2491, in _build_and_run_instance with self._build_resources(context, instance,
File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__ self.gen.throw(typ, value, traceback)
File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 2773, in _build_resources raise exception.BuildAbortException( nova.exception.BuildAbortException: Build of instance 66306274-cad5-471e-9b78-9d67bb200857 aborted: [errno 95] error connecting to the cluster

Other error logs

Error logs in /var/log/ceph/qemu-guest-xxxx.log on each compute nodes output logs like below.

2023-05-04T11:39:04.081+0000 7f6f9dc7e640 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.cinder.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory
2023-05-04T11:39:04.081+0000 7f6f9dc7e640 -1 AuthRegistry(0x7f6f98064228) no keyring found at /etc/ceph/ceph.client.cinder.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin, disabling cephx
2023-05-04T11:39:04.081+0000 7f6f9dc7e640 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.cinder.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory
2023-05-04T11:39:04.081+0000 7f6f9dc7e640 -1 AuthRegistry(0x7f6f9dc7cfb0) no keyring found at /etc/ceph/ceph.client.cinder.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin, disabling cephx
2023-05-04T11:39:04.081+0000 7f6f977fe640 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1]
2023-05-04T11:39:04.081+0000 7f6f9dc7e640 -1 monclient: authenticate NOTE: no keyring found; disabled cephx authentication

The solution is that copying /etc/ceph/ceph.client.cinder.keyring on dev-controller01 or dev-storageXX to each compute nodes.

dev-computeXX
```
/etc/ceph/ceph.client.cinder.keyring
```

Other logs (2)

After applied ceph.client.cinder.keyring, another error has occurred.

/var/log/ceph/qemu-guest-xxxx.log

root@dev-compute01:/var/log/ceph# cat qemu-guest-5712.log
2023-05-04T11:52:31.833+0000 7faa298b1640 -1 asok(0x7faa24000ba0) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/guests/ceph-client.cinder.5712.140368725164832.asok': (13) Permission denied

A solution is that like below.

mkdir -p /var/run/ceph/guests/
chown libvirt-qemu:libvirt /var/run/ceph/guests

TsutomuNakamura commented 1 year ago

Are there any changes about declaring OVN settings.

# grep 6641 . -r --color
./neutron/plugins/ml2/ml2_conf.ini:ovn_nb_connection = tcp:0.0.0.0:6641
./neutron/ovn.ini:#ovn_nb_connection = tcp:127.0.0.1:6641

TsutomuNakamura commented 1 year ago

Should features of Cinder and Ceph be in same host?

tsuna-server / build-server-ansible

Implement Cinder backends Ceph #102

specify UUID genarated above

specify IP address of Monitor Daemon

specify Hostname of Monitor Daemon

mon.(Node name)

specify Hostname of Monitor Daemon

specify IP address of Monitor Daemon

allow to delete pools

An error occurred when creating an instance

Other error logs

Other logs (2)