Open PravinRanjan10 opened 4 years ago
@thatsdone As per discussion, please take a look.
Some notes from our private discussion for records.
ceph -s
says 'HEALTH_OK', but actually the last line of its output means there are no OSDs up and running. So, even if you didn't get the syntax level error originally you got, you should have had another trouble (saying timeout to waiting for ceph cluster stabilization.)@thatsdone Sure, let me reproduce this again. I will share you full log. Thanks for quick look into this.
@PravinRanjan10
Three points.
(1) While trying to reproduce this issue, I noticed that ceph mimic checks number of active OSDs by following 'osd_pool_default_size' and this could make ceph -s
result HEALTH_WARN. I think you are using enough number of disks (or consistent 'osd_pool_default_size' value), but anyway please pay attention to this topic.
(2) When you try to reproduce the issue, please ensure that you have the following configuration parameters in 'ansible/group_vars/osdsdock.yml'.
(3) From a viewpoint of the new feature that you are working on (ceph fs support), I think it's better to use an existing ceph cluster instead of building that via opensds-installer because ceph setup itself is essentially outside of SODA/opensds. What do you think about?
@thatsdone I agree with your 3rd point, But atleast for testing we need ceph cluster using ansible right?
I will also check other two points while recreating issue.
@PravinRanjan10
Hi, Still I'm not sure the root cause of the original symptom (an ansible template error). But, at least, I success fully installed Daito (v.0.11.0) as single node with ceph (1 osd) by specifying 'group_vars/ceph/all.yml' like below.
This is essentially because ceph added minimum osd count check after mimic, and we need to specify 'osd_pool_default_size' in /etc/ceph/ceph.conf. The default is 3, and this means we need at least 3 osds (and osd volumes) without specifying the parameter.
I would suggest to set 'ceph_conf_overrides' as 1 because from a perspective of opensds-installer, ceph cluster is just a mock (normally for testing purpose of other SODS features).
If this makes sense, I can submit a PR.
ubuntu@ubuntu201:~/opensds-installer$ git diff ansible/group_vars/ceph/all.yml
diff --git a/ansible/group_vars/ceph/all.yml b/ansible/group_vars/ceph/all.yml
index c0cc427..49a362e 100644
--- a/ansible/group_vars/ceph/all.yml
+++ b/ansible/group_vars/ceph/all.yml
@@ -25,11 +25,12 @@ dummy:
ceph_origin: repository
ceph_repository: community
-ceph_stable_release: luminous
+ceph_stable_release: mimic
public_network: "{{ ansible_default_ipv4.address }}/24"
cluster_network: "{{ public_network }}"
monitor_interface: "{{ ansible_default_ipv4.interface }}"
devices:
+ - '/dev/vdb'
#- '/dev/sda'
#- '/dev/sdb'
osd_scenario: collocated
@@ -467,7 +468,9 @@ osd_scenario: collocated
# bar: 5678
#
#ceph_conf_overrides: {}
-
+ceph_conf_overrides:
+ global:
+ osd_pool_default_size: 1
#############
# OS TUNING #
ubuntu@ubuntu201:~/opensds-installer$
Thanks in advance, Masanori
@kumarashit Hi, if my comment above makes sense, I will create a PR (as I mentioned in my comment). What do you think about?
@thatsdone I think you can go ahead and rasie PR if it is resolving the issue. Then we can also take your code and re-test. But we had also tried by setting variable 'ceph_conf_overrides' as value 1 and 3 previously, and it didn't help.
But, Again, i would recommand please raise your PR.
Describe the bug A clear and concise description of what the bug is. Tried to install ceph using ansible. At last it fails during sanity check. Please take a look of error below. Interesitng point is health seems to be ok( HEALTH_OK) but other operations failing.
TASK [osdsdock : ceph cluster sanity check] *** task path: /root/opensds-installer/ansible/roles/osdsdock/scenarios/ceph_stabilize.yml:17 fatal: [localhost]: FAILED! => { "msg": "Unexpected templating type error occurred on (INTERVAL={{ ceph_check_interval|quote }}\n MAX_CHECK={{ ceph_check_count|quote }}\n declare -a ceph_stat_array=()\n i=0\n while true\n do\n ceph_stat_array=(
sudo ceph -s | awk '/health:/{print $2;}/osd:/{print $2, $4, $6;}'
)\n # check health status. 3 conditions below\n # 1) HEALTH_OK means healty mon cluster.\n # 2) check joined osd num. At least 1 osd.\n # 3) check joined osds are all up\n if [ \"${ceph_stat_array[0]}\" == \"HEALTH_OK\" ] && [ \"${ceph_stat_array[1]}\" -ge 1 ] && [ \"${ceph_stat_array[1]}\" -eq \"${ceph_stat_array[2]}\" ]; then\n exit 0\n fi\n i=expr ${i} \\+ 1
\n if [ \"${i}\" -ge \"${MAX_CHECK}\" ]; then\n exit 1\n fi\n sleep ${INTERVAL}\n done): 'int' object is not iterable" } to retry, use: --limit @/root/opensds-installer/ansible/site.retry PLAY RECAP **** localhost : ok=52 changed=28 unreachable=0 failed=1 root@root1-Latitude-E7450:~/opensds-installer/ansible# dpkg -l|grep ceph ii ceph 13.2.8-1bionic amd64 distributed storage and file system ii ceph-base 13.2.8-1bionic amd64 common ceph daemon libraries and management tools ii ceph-common 13.2.8-1bionic amd64 common utilities to mount and interact with a ceph storage cluster ii ceph-fuse 13.2.8-1bionic amd64 FUSE-based client for the Ceph distributed file system ii ceph-mds 13.2.8-1bionic amd64 metadata server for the ceph distributed file system ii ceph-mgr 13.2.8-1bionic amd64 manager for the ceph distributed storage system ii ceph-mon 13.2.8-1bionic amd64 monitor server for the ceph storage system ii ceph-osd 13.2.8-1bionic amd64 OSD server for the ceph storage system ii libcephfs2 13.2.8-1bionic amd64 Ceph distributed file system client library ii python-ceph-argparse 13.2.8-1bionic amd64 Python 2 utility libraries for Ceph CLI ii python-cephfs 13.2.8-1bionic amd64 Python 2 libraries for the Ceph libcephfs library root@root1-Latitude-E7450:~/opensds-installer/ansible# ceph fs ls No filesystems enabled root@root1-Latitude-E7450:~/opensds-installer/ansible# root@root1-Latitude-E7450:~/opensds-installer/ansible# ceph -s cluster: id: 0becd271-442d-4f38-a9dc-d140a9f4ad46 health: HEALTH_OK services: mon: 1 daemons, quorum root1-Latitude-E7450 mgr: no daemons active osd: 0 osds: 0 up, 0 inTo Reproduce Steps to reproduce the behavior:
Expected behavior A clear and concise description of what you expected to happen.
Screenshots If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
Smartphone (please complete the following information):
Additional context Add any other context about the problem here.