stackhpc / ansible-collection-cephadm

Apache License 2.0
16 stars 10 forks source link

Adding OSDs directly after bootstrap fails because cluster is not yet up #89

Closed sanderploegsma closed 7 months ago

sanderploegsma commented 1 year ago

I'm running into an issue with the stackhpc.cephadm.cephadm role where adding the OSDs directly after bootstrapping the cluster fails on the mon hosts not used for bootstrapping. Re-running my playbook does succeed, so this might be fixed by adding a task that waits until all mons are up.

Here's the stderr of the "Add OSDs individually" task on a failing host for reference:

Unable to find image 'quay.io/ceph/ceph:v17' locally
v17: Pulling from ceph/ceph\n6c5de04c936d: Pulling fs layer
f1ee40d9db4a: Pulling fs layer
17facd475902: Pulling fs layer
0d557d32f54e: Pulling fs layer
a12aac7905a4: Pulling fs layer
a12aac7905a4: Waiting
0d557d32f54e: Waiting
f1ee40d9db4a: Verifying Checksum
f1ee40d9db4a: Download complete
17facd475902: Verifying Checksum
17facd475902: Download complete
0d557d32f54e: Verifying Checksum
0d557d32f54e: Download complete
6c5de04c936d: Verifying Checksum
6c5de04c936d: Download complete
6c5de04c936d: Pull complete
f1ee40d9db4a: Pull complete
17facd475902: Pull complete
0d557d32f54e: Pull complete
a12aac7905a4: Verifying Checksum
a12aac7905a4: Download complete
a12aac7905a4: Pull complete
Digest: sha256:2b73ccc9816e0a1ee1dfbe21ba9a8cc085210f1220f597b5050ebfcac4bdd346
Status: Downloaded newer image for quay.io/ceph/ceph:v17
Error initializing cluster client: ObjectNotFound('RADOS object not found (error calling conf_read_file)',)
cityofships commented 1 year ago

Thanks for reporting this @sanderploegsma. Can you share your inventory and variables in order to reproduce?