red-hat-storage / cockpit-ceph-installer

Cockpit plugin to simplify Ceph installations
GNU Lesser General Public License v2.1
65 stars 18 forks source link

Validation of Ceph cluster fails due to Unexpected playbook failure. Check ansible-runner-service directory #85

Open aasraoui opened 3 years ago

aasraoui commented 3 years ago

Ceph Installer - Cockpit-ceph-installer.pdf

pcuzner commented 3 years ago

could you drop a screenshot into the issue instead of a pdf please (pdf's don't render, and could be mangled to do nasty stuff)

Until then some basic checks

aasraoui commented 3 years ago

Below is a capture of the stdout log:

Identity added: /usr/share/ansible-runner-service/artifacts/6e68835e-51b8-11eb-8c06-080027191e45/ssh_key_data (/usr/share/ansible-runner-service/artifacts/6e68835e-51b8-11eb-8c06-080027191e45/ssh_key_data)^M [WARNING]: log file at /root/ansible/ansible.log is not writeable and we cannot create it, aborting^M ^M ^M PLAY [Validate hosts against desired cluster state] **** ^M TASK [CEPH_CHECK_ROLE] * Friday 08 January 2021 13:50:18 +0000 (0:00:00.274) 0:00:00.274 **** ok: [Metrics] ok: [Rgw] ok: [Mds] ok: [Osd] [WARNING]: Unhandled error in Python interpreter discovery for host Mon:^M Failed to connect to the host via ssh: Permission denied (publickey,gssapi-^M keyex,gssapi-with-mic,password). fatal: [Mon]: UNREACHABLE! => {"changed": false, "msg": "Data could not be sent to remote host \"Mon\". Make sure this host can be reached over ssh: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).\r\n", "unreachable": true} ^M PLAY RECAP *****^M @

Mds : ok=1 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 ^M Metrics : ok=1 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 ^M Mon : ok=0 changed=0 unreachable=1 failed=0 skipped=0 rescued=0 ignored=0 ^M Osd : ok=1 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 ^M Rgw : ok=1 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 ^M

Friday 08 January 2021 13:56:20 +0000 (0:06:01.955) 0:06:02.230 ****

CEPH_CHECK_ROLE ------------------------------------------------------- 361.95s

image

aasraoui commented 3 years ago

can sssh to Mon node, not sure why it is not reacheable!!! [root@Cockpit-ceph-installer ceph-ansible]# ssh Mon Last login: Sun Jan 10 20:27:27 2021 from 10.0.0.113 [root@Mon ~]#

pcuzner commented 3 years ago

what's strange is that you added the host first. The act of adding a host confirms that the ssh key that the installer uses is in the authorized_keys file on the target. So at some point, 'mon' was accessible using the installers public key. However, right now it doesn't appear so. Checking with root login to mon is misleading, since the installer uses it's own key - unless you provided yuor keys to the installer.

Next steps. compare your authorized_keys file on mon to one of the osd or rgw host try connecting manually using the priv key in /usr/share/ansible-runner-service/env/ssh_key (i.e. use -i /usr/share/ansible-runner-service/env/ssh_key

aasraoui commented 3 years ago

the authorized_keys in Osd is different from the Mon node, manual connection to Mon with priv key works: [root@Cockpit-ceph-installer .ssh]# ssh root@Mon -i /usr/share/ansible-runner-service/env/ssh_key root@mon's password: Last login: Mon Jan 11 04:35:12 2021 from 10.0.0.113 [root@Mon ~]#

aasraoui commented 3 years ago

I have updated mon node with same authorized key as the other nodes, now it is failing for not having any osds on the cluster !!!

image
pcuzner commented 3 years ago

And the problem is?

The installer expects you to have nodes with disks for OSDs, so the osd role can be applied to it. Looking at your screenshot, you've ticked the osd role too. So frmo my perspective this is working as expected.

For a storage cluster you need storage..?

Also just for awareness when you see errors and warnings if you click on the triangle icon, the row will be expanded to show you the error text.

If you're just kicking the tyres - you could just use 2 machines - one for ceph and the other for monitoring - just make sure you have free disks on the node you want to deploy ceph too, and use the container mode deployment (not rpm).