lxc / incus-deploy

Deployment playbooks, configurations and scripts for Incus
Apache License 2.0
41 stars 11 forks source link

Error enabling msgr2 messenger in Ceph during Ansible playbook execution #11

Open reinaldosaraiva opened 1 month ago

reinaldosaraiva commented 1 month ago

Description:When running the Ansible playbook deploy.yaml from the incus-deploy project, an error occurs while attempting to enable the msgr2 messenger in Ceph. The ceph mon enable-msgr2 command fails with a timeout, indicating that it could not connect to the RADOS cluster.

Error Message: fatal: [server01]: FAILED! => {"changed": true, "cmd": "ceph mon enable-msgr2", "delta": "0:05:00.070563", "end": "2024-07-11 13:43:18.284315", "msg": "non-zero return code", "rc": 1, "start": "2024-07-11 13:38:18.213752", "stderr": "2024-07-11T13:43:18.279+0000 7ff21f567640 0 monclient(hunting): authenticate timed out after 300\n[errno 110] RADOS timed out (error connecting to the cluster)", "stderr_lines": ["2024-07-11T13:43:18.279+0000 7ff21f567640 0 monclient(hunting): authenticate timed out after 300", "[errno 110] RADOS timed out (error connecting to the cluster)"], "stdout": "", "stdout_lines": []} fatal: [server03]: FAILED! => {"changed": true, "cmd": "ceph mon enable-msgr2", "delta": "0:05:00.109144", "end": "2024-07-11 13:43:18.320621", "msg": "non-zero return code", "rc": 1, "start": "2024-07-11 13:38:18.211477", "stderr": "2024-07-11T13:43:18.316+0000 7fc48f66d640 0 monclient(hunting): authenticate timed out after 300\n[errno 110] RADOS timed out (error connecting to the cluster)", "stderr_lines": ["2024-07-11T13:43:18.316+0000 7fc48f66d640 0 monclient(hunting): authenticate timed out after 300", "[errno 110] RADOS timed out (error connecting to the cluster)"], "stdout": "", "stdout_lines": []} fatal: [server02]: FAILED! => {"changed": true, "cmd": "ceph mon enable-msgr2", "delta": "0:05:00.093801", "end": "2024-07-11 13:43:18.316757", "msg": "non-zero return code", "rc": 1, "start": "2024-07-11 13:38:18.222956", "stderr": "2024-07-11T13:43:18.314+0000 7f4cb7b4a640 0 monclient(hunting): authenticate timed out after 300\n[errno 110] RADOS timed out (error connecting to the cluster)", "stderr_lines": ["2024-07-11T13:43:18.314+0000 7f4cb7b4a640 0 monclient(hunting): authenticate timed out after 300", "[errno 110] RADOS timed out (error connecting to the cluster)"], "stdout": "", "stdout_lines": []}

Steps to Reproduce:

Execute the Ansible playbook deploy.yaml in the directory ~/incus-deploy/ansible. Observe the error during the task to enable the msgr2 messenger in Ceph. Expected Behavior:

The ceph mon enable-msgr2 command should execute without errors, enabling the msgr2 messenger in the Ceph cluster.

Actual Behavior:

The ceph mon enable-msgr2 command fails with a timeout, indicating it could not connect to the RADOS cluster.

Additional Details:

The error occurs on multiple servers (server01, server02, server03). Specific error message: RADOS timed out (error connecting to the cluster). The playbook was executed as root. Environment:

Ansible version: [2.17.1]] Ubuntu: 22.04


Execute: root@haruunkal:~/incus-deploy/terraform# cd ../ansible/ root@haruunkal:~/incus-deploy/ansible# ansible-playbook deploy.yaml

PLAY [Ceph - Generate cluster keys and maps] ****

TASK [Gathering Facts] ** [WARNING]: Platform linux on host server03 is using the discovered Python interpreter at /usr/bin/python3.10, but future installation of another Python interpreter could change the meaning of that path. See https://docs.ansible.com/ansible- core/2.17/reference_appendices/interpreter_discovery.html for more information. ok: [server03] [WARNING]: Platform linux on host server04 is using the discovered Python interpreter at /usr/bin/python3.10, but future installation of another Python interpreter could change the meaning of that path. See https://docs.ansible.com/ansible- core/2.17/reference_appendices/interpreter_discovery.html for more information. ok: [server04] [WARNING]: Platform linux on host server02 is using the discovered Python interpreter at /usr/bin/python3.10, but future installation of another Python interpreter could change the meaning of that path. See https://docs.ansible.com/ansible- core/2.17/reference_appendices/interpreter_discovery.html for more information. ok: [server02] [WARNING]: Platform linux on host server05 is using the discovered Python interpreter at /usr/bin/python3.10, but future installation of another Python interpreter could change the meaning of that path. See https://docs.ansible.com/ansible- core/2.17/reference_appendices/interpreter_discovery.html for more information. ok: [server05] [WARNING]: Platform linux on host server01 is using the discovered Python interpreter at /usr/bin/python3.10, but future installation of another Python interpreter could change the meaning of that path. See https://docs.ansible.com/ansible- core/2.17/reference_appendices/interpreter_discovery.html for more information. ok: [server01]

TASK [Generate mon keyring] ***** changed: [server03 -> 127.0.0.1] ok: [server04 -> 127.0.0.1] ok: [server01 -> 127.0.0.1] ok: [server05 -> 127.0.0.1] ok: [server02 -> 127.0.0.1]

TASK [Generate client.admin keyring] **** changed: [server03 -> 127.0.0.1] ok: [server04 -> 127.0.0.1] ok: [server01 -> 127.0.0.1] ok: [server05 -> 127.0.0.1] ok: [server02 -> 127.0.0.1]

TASK [Generate bootstrap-osd keyring] *** changed: [server03 -> 127.0.0.1] ok: [server04 -> 127.0.0.1] ok: [server01 -> 127.0.0.1] ok: [server05 -> 127.0.0.1] ok: [server02 -> 127.0.0.1]

TASK [Generate mon map] ***** changed: [server03 -> 127.0.0.1] ok: [server04 -> 127.0.0.1] ok: [server01 -> 127.0.0.1] ok: [server05 -> 127.0.0.1] ok: [server02 -> 127.0.0.1]

RUNNING HANDLER [Add key to client.admin keyring] *** changed: [server03 -> 127.0.0.1]

RUNNING HANDLER [Add key to bootstrap-osd keyring] ** changed: [server03 -> 127.0.0.1]

RUNNING HANDLER [Add nodes to mon map] ** changed: [server03 -> 127.0.0.1] => (item={'name': 'server01', 'ip': 'fd42:60dc:dec6:a73b:216:3eff:fe2d:4c57'}) changed: [server03 -> 127.0.0.1] => (item={'name': 'server02', 'ip': 'fd42:60dc:dec6:a73b:216:3eff:fe05:31f6'}) changed: [server03 -> 127.0.0.1] => (item={'name': 'server03', 'ip': 'fd42:60dc:dec6:a73b:216:3eff:fe01:1c21'})

PLAY [Ceph - Add package repository] ****

TASK [Gathering Facts] ** ok: [server04] ok: [server05] ok: [server03] ok: [server01] ok: [server02]

TASK [Create apt keyring path] ** ok: [server03] ok: [server01] ok: [server05] ok: [server04] ok: [server02]

TASK [Add ceph GPG key] ***** changed: [server04] changed: [server03] changed: [server05] changed: [server01] changed: [server02]

TASK [Get DPKG architecture] **** ok: [server04] ok: [server03] ok: [server05] ok: [server01] ok: [server02]

TASK [Add ceph package sources] ***** changed: [server03] changed: [server05] changed: [server04] changed: [server02] changed: [server01]

RUNNING HANDLER [Update apt] **** changed: [server01] changed: [server04] changed: [server05] changed: [server03] changed: [server02]

PLAY [Ceph - Install packages] **

TASK [Gathering Facts] ** ok: [server01] ok: [server04] ok: [server05] ok: [server03] ok: [server02]

TASK [Install ceph-common] ** changed: [server02] changed: [server03] changed: [server05] changed: [server04] changed: [server01]

TASK [Install ceph-mon] ***** skipping: [server04] skipping: [server05] changed: [server03] changed: [server01] changed: [server02]

TASK [Install ceph-mgr] ***** skipping: [server04] skipping: [server05] changed: [server03] changed: [server02] changed: [server01]

TASK [Install ceph-mds] ***** skipping: [server04] skipping: [server05] changed: [server01] changed: [server02] changed: [server03]

TASK [Install ceph-osd] ***** changed: [server01] changed: [server04] changed: [server03] changed: [server02] changed: [server05]

TASK [Install ceph-rbd-mirror] ** skipping: [server01] skipping: [server02] skipping: [server04] skipping: [server05] skipping: [server03]

TASK [Install radosgw] ** skipping: [server01] skipping: [server02] skipping: [server03] changed: [server04] changed: [server05]

PLAY [Ceph - Set up config and keyrings] ****

TASK [Transfer the cluster configuration] *** changed: [server01] changed: [server04] changed: [server03] changed: [server05] changed: [server02]

TASK [Create main storage directory] **** ok: [server04] ok: [server01] ok: [server03] ok: [server05] ok: [server02]

TASK [Create monitor bootstrap path] **** skipping: [server05] skipping: [server04] changed: [server01] changed: [server03] changed: [server02]

TASK [Create OSD bootstrap path] **** changed: [server05] changed: [server04] changed: [server01] changed: [server03] changed: [server02]

TASK [Transfer main admin keyring] ** changed: [server05] changed: [server03] changed: [server01] changed: [server02] changed: [server04]

TASK [Transfer additional client keyrings] ** skipping: [server05] skipping: [server03] skipping: [server04] skipping: [server01] skipping: [server02]

TASK [Transfer bootstrap mon keyring] *** skipping: [server05] skipping: [server04] changed: [server03] changed: [server02] changed: [server01]

TASK [Transfer bootstrap mon map] *** skipping: [server05] skipping: [server04] changed: [server03] changed: [server02] changed: [server01]

TASK [Transfer bootstrap OSD keyring] *** changed: [server05] changed: [server04] changed: [server01] changed: [server03] changed: [server02]

RUNNING HANDLER [Restart Ceph] ** changed: [server05] changed: [server03] changed: [server02] changed: [server04] changed: [server01]

PLAY [Ceph - Deploy mon] ****

TASK [Gathering Facts] ** ok: [server01] ok: [server02] ok: [server05] ok: [server04] ok: [server03]

TASK [Bootstrap Ceph mon] *** skipping: [server04] skipping: [server05] changed: [server02] changed: [server03] changed: [server01]

TASK [Enable and start Ceph mon] **** skipping: [server04] skipping: [server05] changed: [server02] changed: [server03] changed: [server01]

RUNNING HANDLER [Enable msgr2] ** fatal: [server01]: FAILED! => {"changed": true, "cmd": "ceph mon enable-msgr2", "delta": "0:05:00.070563", "end": "2024-07-11 13:43:18.284315", "msg": "non-zero return code", "rc": 1, "start": "2024-07-11 13:38:18.213752", "stderr": "2024-07-11T13:43:18.279+0000 7ff21f567640 0 monclient(hunting): authenticate timed out after 300\n[errno 110] RADOS timed out (error connecting to the cluster)", "stderr_lines": ["2024-07-11T13:43:18.279+0000 7ff21f567640 0 monclient(hunting): authenticate timed out after 300", "[errno 110] RADOS timed out (error connecting to the cluster)"], "stdout": "", "stdout_lines": []} fatal: [server03]: FAILED! => {"changed": true, "cmd": "ceph mon enable-msgr2", "delta": "0:05:00.109144", "end": "2024-07-11 13:43:18.320621", "msg": "non-zero return code", "rc": 1, "start": "2024-07-11 13:38:18.211477", "stderr": "2024-07-11T13:43:18.316+0000 7fc48f66d640 0 monclient(hunting): authenticate timed out after 300\n[errno 110] RADOS timed out (error connecting to the cluster)", "stderr_lines": ["2024-07-11T13:43:18.316+0000 7fc48f66d640 0 monclient(hunting): authenticate timed out after 300", "[errno 110] RADOS timed out (error connecting to the cluster)"], "stdout": "", "stdout_lines": []} fatal: [server02]: FAILED! => {"changed": true, "cmd": "ceph mon enable-msgr2", "delta": "0:05:00.093801", "end": "2024-07-11 13:43:18.316757", "msg": "non-zero return code", "rc": 1, "start": "2024-07-11 13:38:18.222956", "stderr": "2024-07-11T13:43:18.314+0000 7f4cb7b4a640 0 monclient(hunting): authenticate timed out after 300\n[errno 110] RADOS timed out (error connecting to the cluster)", "stderr_lines": ["2024-07-11T13:43:18.314+0000 7f4cb7b4a640 0 monclient(hunting): authenticate timed out after 300", "[errno 110] RADOS timed out (error connecting to the cluster)"], "stdout": "", "stdout_lines": []}

PLAY RECAP ** server01 : ok=29 changed=18 unreachable=0 failed=1 skipped=3 rescued=0 ignored=0
server02 : ok=29 changed=18 unreachable=0 failed=1 skipped=3 rescued=0 ignored=0
server03 : ok=32 changed=25 unreachable=0 failed=1 skipped=3 rescued=0 ignored=0
server04 : ok=22 changed=11 unreachable=0 failed=0 skipped=10 rescued=0 ignored=0
server05 : ok=22 changed=11 unreachable=0 failed=0 skipped=10 rescued=0 ignored=0

stgraber commented 1 month ago

That would happen if the Ceph cluster isn't functional.

This most commonly happen if you have fully redone your deployment without also wiping the data from the ansible/data directory.

In this scenario you end up with a freshly deployed cluster that's still expecting the servers from the previous deployment and so is unable to achieve a quorum, causing the Ceph API to fall to come online and results in the configuration failure you're getting.

reinaldosaraiva commented 1 month ago

Ceph monitor initialization issue: monmap min_mon_release older than installed version ERROR: Jul 11 16:51:06 distrobuilder-5cca1f2a-f8a9-4b77-a1df-8173d38747bc systemd[1]: Created slice Slice /system/ceph-mon. Jul 11 16:51:06 distrobuilder-5cca1f2a-f8a9-4b77-a1df-8173d38747bc systemd[1]: Reached target System Time Synchronized. Jul 11 16:51:06 distrobuilder-5cca1f2a-f8a9-4b77-a1df-8173d38747bc systemd[1]: Started Ceph cluster monitor daemon. Jul 11 16:51:06 distrobuilder-5cca1f2a-f8a9-4b77-a1df-8173d38747bc ceph-mon[6467]: 2024-07-11T16:51:06.738+0000 7f0cf2c8cc40 -1 mon.server01@-1(probing) e0 current monmap has recorded min_mon_release 15 (octopus) is more than two releases older than installed 18 (reef); you can only upgrade 2 releases at a time Jul 11 16:51:06 distrobuilder-5cca1f2a-f8a9-4b77-a1df-8173d38747bc ceph-mon[6467]: you should first upgrade to 16 (pacific) or 17 (quincy)

stgraber commented 1 month ago

Can you show monmaptool --show ansible/data/ceph/cluster.FSID.mon.map?

Normally the logic in the playbook is to set the min-mon-release in the mon map to the same release as ceph_release (reef by default).

reinaldosaraiva commented 1 month ago

That would happen if the Ceph cluster isn't functional.

This most commonly happen if you have fully redone your deployment without also wiping the data from the ansible/data directory.

In this scenario you end up with a freshly deployed cluster that's still expecting the servers from the previous deployment and so is unable to achieve a quorum, causing the Ceph API to fall to come online and results in the configuration failure you're getting.

I have already cleaned the data/ceph/ folder and others. I also used both Quincy and Reef versions. I am lost in this deployment.

stgraber commented 1 month ago

Also the output of git rev-parse HEAD would be useful

reinaldosaraiva commented 1 month ago

git rev-parse HEAD

root@haruunkal:~/incus-deploy# git rev-parse HEAD f207054ed42fbcfb9916c4452e8abc60bd14bcbb

stgraber commented 1 month ago

Okay, so it shouldn't be because of lack of support for calling monmaptool with the needed set-min-mon-release, but then it's pretty confusing as to why it would have set a release of 15 when it should have been passed 18.

The output of monmaptool --show ansible/data/ceph/cluster.FSID.mon.map may help figure it out

reinaldosaraiva commented 1 month ago

Thank you very much for your support. It seems that there was an issue with my lab workstation that was resolved only when I disabled the IPv6 network. After that, the entire process ran perfectly.

reinaldosaraiva commented 1 month ago

Okay, so it shouldn't be because of lack of support for calling monmaptool with the needed set-min-mon-release, but then it's pretty confusing as to why it would have set a release of 15 when it should have been passed 18.

The output of monmaptool --show ansible/data/ceph/cluster.FSID.mon.map may help figure it out

root@haruunkal:~/incus-deploy# monmaptool --print ansible/data/ceph/cluster.e2850e1f-7aab-472e-b6b1-824e19a75071.mon.map monmaptool: monmap file ansible/data/ceph/cluster.e2850e1f-7aab-472e-b6b1-824e19a75071.mon.map epoch 0 fsid e2850e1f-7aab-472e-b6b1-824e19a75071 last_changed 2024-07-11T15:15:56.636758-0300 created 2024-07-11T15:15:56.636758-0300 min_mon_release 15 (octopus) election_strategy: 1 0: v1:10.177.121.10:6789/0 mon.server03 1: v1:10.177.121.13:6789/0 mon.server01 2: v1:10.177.121.242:6789/0 mon.server02

reinaldosaraiva commented 1 month ago

rsrsrs. Other error: TASK [Install the Incus package] *** task path: /root/incus-deploy/ansible/books/incus.yaml:60

ESTABLISH Incus CONNECTION FOR USER: root EXEC /bin/sh -c 'echo ~root && sleep 0' ESTABLISH Incus CONNECTION FOR USER: root EXEC /bin/sh -c 'echo ~root && sleep 0' ESTABLISH Incus CONNECTION FOR USER: root EXEC /bin/sh -c 'echo ~root && sleep 0' ESTABLISH Incus CONNECTION FOR USER: root EXEC /bin/sh -c 'echo ~root && sleep 0' ESTABLISH Incus CONNECTION FOR USER: root EXEC /bin/sh -c 'echo ~root && sleep 0' EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo /root/.ansible/tmp `"&& mkdir "` echo /root/.ansible/tmp/ansible-tmp-1720722234.6681595-434073-171543719892459 `" && echo ansible-tmp-1720722234.6681595-434073-171543719892459="` echo /root/.ansible/tmp/ansible-tmp-1720722234.6681595-434073-171543719892459 `" ) && sleep 0' EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo /root/.ansible/tmp `"&& mkdir "` echo /root/.ansible/tmp/ansible-tmp-1720722234.6905096-434074-237109845491203 `" && echo ansible-tmp-1720722234.6905096-434074-237109845491203="` echo /root/.ansible/tmp/ansible-tmp-1720722234.6905096-434074-237109845491203 `" ) && sleep 0' EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo /root/.ansible/tmp `"&& mkdir "` echo /root/.ansible/tmp/ansible-tmp-1720722234.7017157-434080-221203579380132 `" && echo ansible-tmp-1720722234.7017157-434080-221203579380132="` echo /root/.ansible/tmp/ansible-tmp-1720722234.7017157-434080-221203579380132 `" ) && sleep 0' EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo /root/.ansible/tmp `"&& mkdir "` echo /root/.ansible/tmp/ansible-tmp-1720722234.704671-434088-195068848106763 `" && echo ansible-tmp-1720722234.704671-434088-195068848106763="` echo /root/.ansible/tmp/ansible-tmp-1720722234.704671-434088-195068848106763 `" ) && sleep 0' EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo /root/.ansible/tmp `"&& mkdir "` echo /root/.ansible/tmp/ansible-tmp-1720722234.7366605-434103-88856948673624 `" && echo ansible-tmp-1720722234.7366605-434103-88856948673624="` echo /root/.ansible/tmp/ansible-tmp-1720722234.7366605-434103-88856948673624 `" ) && sleep 0' Using module file /usr/local/lib/python3.10/dist-packages/ansible/modules/apt.py PUT /root/.ansible/tmp/ansible-local-401506wyvvy13h/tmpfw9gbsu5 TO /root/.ansible/tmp/ansible-tmp-1720722234.6681595-434073-171543719892459/AnsiballZ_apt.py Using module file /usr/local/lib/python3.10/dist-packages/ansible/modules/apt.py PUT /root/.ansible/tmp/ansible-local-401506wyvvy13h/tmpo15mkolf TO /root/.ansible/tmp/ansible-tmp-1720722234.6905096-434074-237109845491203/AnsiballZ_apt.py EXEC /bin/sh -c 'chmod u+x /root/.ansible/tmp/ansible-tmp-1720722234.6681595-434073-171543719892459/ /root/.ansible/tmp/ansible-tmp-1720722234.6681595-434073-171543719892459/AnsiballZ_apt.py && sleep 0' Using module file /usr/local/lib/python3.10/dist-packages/ansible/modules/apt.py PUT /root/.ansible/tmp/ansible-local-401506wyvvy13h/tmp0yqjokw7 TO /root/.ansible/tmp/ansible-tmp-1720722234.704671-434088-195068848106763/AnsiballZ_apt.py Using module file /usr/local/lib/python3.10/dist-packages/ansible/modules/apt.py PUT /root/.ansible/tmp/ansible-local-401506wyvvy13h/tmpc2_lbb63 TO /root/.ansible/tmp/ansible-tmp-1720722234.7017157-434080-221203579380132/AnsiballZ_apt.py EXEC /bin/sh -c 'chmod u+x /root/.ansible/tmp/ansible-tmp-1720722234.6905096-434074-237109845491203/ /root/.ansible/tmp/ansible-tmp-1720722234.6905096-434074-237109845491203/AnsiballZ_apt.py && sleep 0' EXEC /bin/sh -c 'chmod u+x /root/.ansible/tmp/ansible-tmp-1720722234.704671-434088-195068848106763/ /root/.ansible/tmp/ansible-tmp-1720722234.704671-434088-195068848106763/AnsiballZ_apt.py && sleep 0' EXEC /bin/sh -c 'chmod u+x /root/.ansible/tmp/ansible-tmp-1720722234.7017157-434080-221203579380132/ /root/.ansible/tmp/ansible-tmp-1720722234.7017157-434080-221203579380132/AnsiballZ_apt.py && sleep 0' Using module file /usr/local/lib/python3.10/dist-packages/ansible/modules/apt.py PUT /root/.ansible/tmp/ansible-local-401506wyvvy13h/tmpe4qdstqu TO /root/.ansible/tmp/ansible-tmp-1720722234.7366605-434103-88856948673624/AnsiballZ_apt.py EXEC /bin/sh -c '/usr/bin/python3.10 /root/.ansible/tmp/ansible-tmp-1720722234.6681595-434073-171543719892459/AnsiballZ_apt.py && sleep 0' EXEC /bin/sh -c 'chmod u+x /root/.ansible/tmp/ansible-tmp-1720722234.7366605-434103-88856948673624/ /root/.ansible/tmp/ansible-tmp-1720722234.7366605-434103-88856948673624/AnsiballZ_apt.py && sleep 0' EXEC /bin/sh -c '/usr/bin/python3.10 /root/.ansible/tmp/ansible-tmp-1720722234.6905096-434074-237109845491203/AnsiballZ_apt.py && sleep 0' EXEC /bin/sh -c '/usr/bin/python3.10 /root/.ansible/tmp/ansible-tmp-1720722234.704671-434088-195068848106763/AnsiballZ_apt.py && sleep 0' EXEC /bin/sh -c '/usr/bin/python3.10 /root/.ansible/tmp/ansible-tmp-1720722234.7017157-434080-221203579380132/AnsiballZ_apt.py && sleep 0' EXEC /bin/sh -c '/usr/bin/python3.10 /root/.ansible/tmp/ansible-tmp-1720722234.7366605-434103-88856948673624/AnsiballZ_apt.py && sleep 0' EXEC /bin/sh -c 'rm -f -r /root/.ansible/tmp/ansible-tmp-1720722234.7366605-434103-88856948673624/ > /dev/null 2>&1 && sleep 0' EXEC /bin/sh -c 'rm -f -r /root/.ansible/tmp/ansible-tmp-1720722234.6681595-434073-171543719892459/ > /dev/null 2>&1 && sleep 0' EXEC /bin/sh -c 'rm -f -r /root/.ansible/tmp/ansible-tmp-1720722234.6905096-434074-237109845491203/ > /dev/null 2>&1 && sleep 0' EXEC /bin/sh -c 'rm -f -r /root/.ansible/tmp/ansible-tmp-1720722234.7017157-434080-221203579380132/ > /dev/null 2>&1 && sleep 0' EXEC /bin/sh -c 'rm -f -r /root/.ansible/tmp/ansible-tmp-1720722234.704671-434088-195068848106763/ > /dev/null 2>&1 && sleep 0' [WARNING]: Error deleting remote temporary files (rc: 255, stderr: Error: dial unix /run/incus/dev-incus-deploy_server02/qemu.monitor: connect: connection refused }) EXEC /bin/sh -c 'rm -f -r /root/.ansible/tmp/ansible-tmp-1720722234.7366605-434103-88856948673624/ > /dev/null 2>&1 && sleep 0' fatal: [server03]: FAILED! => { "changed": false, "module_stderr": "Error: websocket: close 1006 (abnormal closure): unexpected EOF\n", "module_stdout": "", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 1 } fatal: [server05]: FAILED! => { "changed": false, "module_stderr": "Error: websocket: close 1006 (abnormal closure): unexpected EOF\n", "module_stdout": "", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 1 } fatal: [server04]: FAILED! => { "changed": false, "module_stderr": "Error: websocket: close 1006 (abnormal closure): unexpected EOF\n", "module_stdout": "", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 1 } fatal: [server01]: FAILED! => { "changed": false, "module_stderr": "Error: websocket: close 1006 (abnormal closure): unexpected EOF\n", "module_stdout": "", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 1 } fatal: [server02]: FAILED! => { "changed": false, "module_stderr": "Error: websocket: close 1006 (abnormal closure): unexpected EOF\n", "module_stdout": "", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 1 }
stgraber commented 1 month ago

Yeah, so the min_mon_release 15 (octopus) is obviously going to be a problem but I don't get why it would be set to that when we specifically call monmaptool with the argument to set it to 18...

Maybe that older version of monmaptool doesn't know how to handle that properly?

You could add the Ceph repository to your own machine and then update to a new version of monmaptool, that would certainly fix that issue, it just shouldn't be necessary...