Closed ewerk-pnahratow closed 6 years ago
The GlusterFS playbooks are not guaranteed to be idempotent, and thus running them more than once per installation is not supported. To add new devices and nodes to the GlusterFS cluster you need to do so through the heketi-cli client. An example command:
oc rsh <heketi_pod> heketi-cli -s http://localhost:8080 --user admin --secret <admin_key> topology info
You can find the admin_key
by running oc describe po <heketi_pod>
and checking the env variables. See the device add help
subcommand of heketi-cli
for more information on the exact syntax.
Reading the variables
heketipod="$(oc get pod -n glusterfs | grep heketi-storage | awk '{print $1}')"
heketikey="$(oc get deploymentconfigs/heketi-storage -n glusterfs --template='{{.spec.template.spec.containers}}' | grep -oP '(?<=\[name:HEKETI_ADMIN_KEY value:)\S+(?=\])')"
oc rsh -n=glusterfs "$heketipod" heketi-cli -s http://localhost:8080 --user admin --secret "$heketikey" topology info
When I then manually
[root@os-master-1 ~]# oc rsh -n=glusterfs "$heketipod" heketi-cli -s http://localhost:8080 --user admin --secret "$heketikey" device add --name=/dev/sdd --node=0ecc0dcbf279ce5bceeaff1e026a3dd0
Error: Unable to execute command on glusterfs-storage-p8whz: WARNING: Not using lvmetad because config setting use_lvmetad=0.
WARNING: To avoid corruption, rescan devices to make changes visible (pvscan --cache).
Device /dev/sdd not found (or ignored by filtering).
command terminated with exit code 255
The same problem occurs
Does lsblk from inside the GlusterFS containers show the device? If not, did you try running pvscan --cache
from inside the GlusterFS pods?
Hi there,
I fiddled around a little more and I'm pretty sure the multipath thing is causing the problem.
First of all, I can see the devices even from inside the pods everything is good. The main problem is that even on the storage host pvcreate /dev/sdd
fails with the same error.
When I instead use the multipath device I can create the pv using
pvcreate /dev/mapper/1VENDOR_NFS_4_0_3887_e5c9f803_e2f2_4d10_87ee_f3e46f91fd6c
Also I can delete the mutlipath device using multipath -f
and after that creating the pv with /dev/sdd
works again too.
I'm not very familiar with the whole multipathing business but as far as I understand it gets configured when roles/openshift_node/tasks/storage_plugins/iscsi.yml
runs and therefore the problem doesn't occur on the first install.
I added the multipath devices using heketi and that seemed to work
Node Id: a6399d6114ba9b2ff377a305b4e76a25
State: online
Cluster Id: 6455ee6a8726324e54cdb1dddd3b6ddc
Zone: 1
Management Hostnames: os-storage-1.lab.com
Storage Hostnames: 10.3.1.148
Devices:
Id:9e07c4579599eac9bb5c97f637be0759 Name:/dev/sdc State:online Size (GiB):499 Used (GiB):103 Free (GiB):396
Bricks:
Id:995b4c692f354d1329b866d499119090 Size (GiB):1 Path: /var/lib/heketi/mounts/vg_9e07c4579599eac9bb5c97f637be0759/brick_995b4c692f354d1329b866d499119090/brick
Id:9a3125acdf8c9c77141c4812c54a48e8 Size (GiB):2 Path: /var/lib/heketi/mounts/vg_9e07c4579599eac9bb5c97f637be0759/brick_9a3125acdf8c9c77141c4812c54a48e8/brick
Id:b9d6823a45b04c1cbc9072f5f3af56d0 Size (GiB):50 Path: /var/lib/heketi/mounts/vg_9e07c4579599eac9bb5c97f637be0759/brick_b9d6823a45b04c1cbc9072f5f3af56d0/brick
Id:d5e843ba70b23396dd4498aec95e0b09 Size (GiB):50 Path: /var/lib/heketi/mounts/vg_9e07c4579599eac9bb5c97f637be0759/brick_d5e843ba70b23396dd4498aec95e0b09/brick
Id:bbca5e50263de14c4f58ea553101a4f7 Name:/dev/mapper/1VENDOR_NFS_4_0_3885_2d256b78_f72f_49af_ba67_9667a406a204State:online Size (GiB):99 Used (GiB):0 Free (GiB):99
Bricks:
Does this look good?
Just to recap: The currently suggested method for extending CNS with additional disks is by using the heketi-cli?
Thanks for your help and for taking the time
Yes, heketi-cli is the current recommended method.
Hmm... this seems somewhat cumbersome. Can you provide the exact commands you used to get it to work? Also can you say more on why this doesn't impact initial deployment, is it because device mapper isn't enabled initially thus the pre-existing devices aren't remapped?
I used heketis device add
changing the --name
from /dev/sdd
to the generated devicemapper name
heketipod="$(oc get pod -n glusterfs | grep heketi-storage | awk '{print $1}')"
heketikey="$(oc get deploymentconfigs/heketi-storage -n glusterfs --template='{{.spec.template.spec.containers}}' | grep -oP '(?<=\[name:HEKETI_ADMIN_KEY value:)\S+(?=\])')"
oc rsh -n=glusterfs "$heketipod" heketi-cli -s http://localhost:8080 --user admin --secret "$heketikey" device add --name=/dev/mapper/1VENDOR_NFS_4_0_3885_2d256b78_f72f_49af_ba67_9667a406a204 --node=0ecc0dcbf279ce5bceeaff1e026a3dd0
I did that three times in total since I have 3 nodes.
Also can you say more on why this doesn't impact initial deployment, is it because device mapper isn't enabled initially thus the pre-existing devices aren't remapped?
Yes thats exactly what I am talking about. My freshly installed hosts have only the /dev/sd*
Oh, okay, so the multipath -f
thing was only to get the /dev/sd*
naming to working again. Got it.
Hmm... this presents a problem. I'm not sure how to best get around this. Was this a disk that was already in the machine prior to OpenShift installation, or was it a disk that was added to the node after installation?
We actually had this same issue on a fresh OCP 3.7 deployment where automation was added outside of openshift-ansible to prepare each node hosting glusterfs by staging prereqs (including installation of multipathd + loading dm_multipath ) prior to first execution of cns-deploy topology load
. We worked around the issue with a multipath -F
and systemctl stop multipathd
then reran the original cns-deploy. After successful execution we restarted multipathd.
@jarrpa yes, those were just my debugging steps in order to understand the issue. The disks were introduced after the initial installation. I'm using VMs and just did a shutdown, add disk, boot for each of the VMs.
With @liveaverage 's method of completely flushing the multipath dms it should be possible to extend glusterfs using the playbook (haven't tried, though). For me this is a valid way.
If the playbook at some point wants to handle cns storage scaling "officially" this issue would have to be resolved in a clean way.
All right, thanks! Can the issue be closed, then?
Yes. Thank you
PR https://github.com/openshift/openshift-ansible/pull/7367 will solve the issue. The 3.7 merge is still pending (https://github.com/openshift/openshift-ansible/pull/8152)
Description
Version
Steps To Reproduce
playbooks/byo/config.yml
and glusterfs nodesglusterfs_devices
playbooks/byo/config.yml
againExpected Results
More storage available.
Observed Results
The playbook run stopped at
TASK [openshift_storage_glusterfs : Load heketi topology]
with the following error
Additional Information
Provide any additional information which may help us diagnose the issue.
CentOS Linux release 7.4.1708 (Core)
[glusterfs] os-storage-1.lab.com glusterfs_ip='10.3.1.122' glusterfs_devices='[ "/dev/sdc", "/dev/sdd" ]' os-storage-2.lab.com glusterfs_ip='10.3.1.123' glusterfs_devices='[ "/dev/sdc", "/dev/sdd" ]' os-storage-3.lab.com glusterfs_ip='10.3.1.124' glusterfs_devices='[ "/dev/sdc", "/dev/sdd" ]'
[root@os-storage-3 ~]# lsblk /dev/sdd NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sdd 8:48 0 100G 0 disk
└─1VENDOR_NFS_4_0_3887_e5c9f803_e2f2_4d10_87ee_f3e46f91fd6c 253:3 0 100G 0 mpath