Cannot add disk to existing CNS (glusterfs)

ewerk-pnahratow commented 6 years ago

Description

After a successful installation of OpenShift Origin v3.7 I tried to add a disk to the glusterfs nodes expecting to increase the amount of available storage by adding the devices names to the inventoryfile and rerunning the playbook. The playbook aborted when loading the heketi topology file

Version

[openshift@os-bastion-1 openshift-ansible]$ ansible --version
ansible 2.5.0
  config file = /opt/openshift/openshift-ansible/ansible.cfg
  configured module search path = [u'/home/openshift/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /bin/ansible
  python version = 2.7.5 (default, Aug  4 2017, 00:39:18) [GCC 4.8.5 20150623 (Red Hat 4.8.5-16)]

[openshift@os-bastion-1 openshift-ansible]$ git describe
openshift-ansible-3.7.44-1-30-g5b7a769

Steps To Reproduce

Install openshift with playbooks/byo/config.yml and glusterfs nodes
Add disk to storage nodes
Add device names to glusterfs_devices
run playbooks/byo/config.yml again

Expected Results

More storage available.

Observed Results

The playbook run stopped at

TASK [openshift_storage_glusterfs : Load heketi topology]

with the following error

{
  "changed": true,
  "cmd": [
    "oc",
    "--config=/tmp/openshift-glusterfs-ansible-lUnfw5/admin.kubeconfig",
    "rsh",
    "--namespace=glusterfs",
    "heketi-storage-1-kktjv",
    "heketi-cli",
    "-s",
    "http://localhost:8080",
    "--user",
    "admin",
    "--secret",
    "redacted",
    "topology",
    "load",
    "--json=/tmp/openshift-glusterfs-ansible-lUnfw5/topology.json",
    "2>&1"
  ],
  "delta": "0:00:03.610762",
  "end": "2018-04-20 12:05:38.335115",
  "failed_when_result": true,
  "rc": 0,
  "start": "2018-04-20 12:05:34.724353",
  "stderr": "",
  "stderr_lines": [],  
  "stdout_lines": [
    "    Found node os-storage-1.lab.com on cluster 6455ee6a8726324e54cdb1dddd3b6ddc",
    "    Found device /dev/sdc",
    "    Adding device /dev/sdd ... Unable to add device: Unable to execute command on glusterfs-storage-s7hk9:   WARNING: Not using lvmetad because config setting use_lvmetad=0.",
    "  WARNING: To avoid corruption, rescan devices to make changes visible (pvscan --cache).",
    "  Device /dev/sdd not found (or ignored by filtering).",
    "    Found node os-storage-2.lab.com on cluster 6455ee6a8726324e54cdb1dddd3b6ddc",
    "    Found device /dev/sdc",
    "    Adding device /dev/sdd ... Unable to add device: Unable to execute command on glusterfs-storage-p8whz:   WARNING: Not using lvmetad because config setting use_lvmetad=0.",
    "  WARNING: To avoid corruption, rescan devices to make changes visible (pvscan --cache).",
    "  Device /dev/sdd not found (or ignored by filtering).",
    "    Found node os-storage-3.lab.com on cluster 6455ee6a8726324e54cdb1dddd3b6ddc",
    "    Found device /dev/sdc",
    "    Adding device /dev/sdd ... Unable to add device: Unable to execute command on glusterfs-storage-nmbm5:   WARNING: Not using lvmetad because config setting use_lvmetad=0.",
    "  WARNING: To avoid corruption, rescan devices to make changes visible (pvscan --cache).",
    "  Device /dev/sdd not found (or ignored by filtering)."
  ]
}

Additional Information

Provide any additional information which may help us diagnose the issue.

CentOS Linux release 7.4.1708 (Core)

Gluster related parts of the inventory


openshift_hosted_registry_storage_kind=glusterfs
openshift_storage_glusterfs_registry_storageclass=True
openshift_storage_glusterfs_storageclass_default=True

[glusterfs] os-storage-1.lab.com glusterfs_ip='10.3.1.122' glusterfs_devices='[ "/dev/sdc", "/dev/sdd" ]' os-storage-2.lab.com glusterfs_ip='10.3.1.123' glusterfs_devices='[ "/dev/sdc", "/dev/sdd" ]' os-storage-3.lab.com glusterfs_ip='10.3.1.124' glusterfs_devices='[ "/dev/sdc", "/dev/sdd" ]'

* `lsblk` shows some mpath device below `/dev/sdd`

[root@os-storage-3 ~]# lsblk /dev/sdd NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sdd 8:48 0 100G 0 disk
└─1VENDOR_NFS_4_0_3887_e5c9f803_e2f2_4d10_87ee_f3e46f91fd6c 253:3 0 100G 0 mpath

jarrpa commented 6 years ago

The GlusterFS playbooks are not guaranteed to be idempotent, and thus running them more than once per installation is not supported. To add new devices and nodes to the GlusterFS cluster you need to do so through the heketi-cli client. An example command:

oc rsh <heketi_pod> heketi-cli -s http://localhost:8080 --user admin --secret <admin_key> topology info

You can find the admin_key by running oc describe po <heketi_pod> and checking the env variables. See the device add help subcommand of heketi-cli for more information on the exact syntax.

ewerk-pnahratow commented 6 years ago

Reading the variables

heketipod="$(oc get pod -n glusterfs | grep heketi-storage | awk '{print $1}')"
heketikey="$(oc get deploymentconfigs/heketi-storage -n glusterfs --template='{{.spec.template.spec.containers}}' | grep -oP '(?<=\[name:HEKETI_ADMIN_KEY value:)\S+(?=\])')"
oc rsh -n=glusterfs "$heketipod" heketi-cli -s http://localhost:8080 --user admin --secret "$heketikey" topology info

When I then manually

[root@os-master-1 ~]# oc rsh -n=glusterfs "$heketipod" heketi-cli -s http://localhost:8080 --user admin --secret "$heketikey" device add --name=/dev/sdd --node=0ecc0dcbf279ce5bceeaff1e026a3dd0
Error: Unable to execute command on glusterfs-storage-p8whz:   WARNING: Not using lvmetad because config setting use_lvmetad=0.
  WARNING: To avoid corruption, rescan devices to make changes visible (pvscan --cache).
  Device /dev/sdd not found (or ignored by filtering).
command terminated with exit code 255

The same problem occurs

jarrpa commented 6 years ago

Does lsblk from inside the GlusterFS containers show the device? If not, did you try running pvscan --cache from inside the GlusterFS pods?

ewerk-pnahratow commented 6 years ago

Hi there,

I fiddled around a little more and I'm pretty sure the multipath thing is causing the problem.

First of all, I can see the devices even from inside the pods everything is good. The main problem is that even on the storage host pvcreate /dev/sdd fails with the same error.

When I instead use the multipath device I can create the pv using

pvcreate /dev/mapper/1VENDOR_NFS_4_0_3887_e5c9f803_e2f2_4d10_87ee_f3e46f91fd6c

Also I can delete the mutlipath device using multipath -f and after that creating the pv with /dev/sdd works again too.

I'm not very familiar with the whole multipathing business but as far as I understand it gets configured when roles/openshift_node/tasks/storage_plugins/iscsi.yml runs and therefore the problem doesn't occur on the first install.

I added the multipath devices using heketi and that seemed to work

Node Id: a6399d6114ba9b2ff377a305b4e76a25
State: online
Cluster Id: 6455ee6a8726324e54cdb1dddd3b6ddc
Zone: 1
Management Hostnames: os-storage-1.lab.com
Storage Hostnames: 10.3.1.148
Devices:
    Id:9e07c4579599eac9bb5c97f637be0759   Name:/dev/sdc            State:online    Size (GiB):499     Used (GiB):103     Free (GiB):396     
        Bricks:
            Id:995b4c692f354d1329b866d499119090   Size (GiB):1       Path: /var/lib/heketi/mounts/vg_9e07c4579599eac9bb5c97f637be0759/brick_995b4c692f354d1329b866d499119090/brick
            Id:9a3125acdf8c9c77141c4812c54a48e8   Size (GiB):2       Path: /var/lib/heketi/mounts/vg_9e07c4579599eac9bb5c97f637be0759/brick_9a3125acdf8c9c77141c4812c54a48e8/brick
            Id:b9d6823a45b04c1cbc9072f5f3af56d0   Size (GiB):50      Path: /var/lib/heketi/mounts/vg_9e07c4579599eac9bb5c97f637be0759/brick_b9d6823a45b04c1cbc9072f5f3af56d0/brick
            Id:d5e843ba70b23396dd4498aec95e0b09   Size (GiB):50      Path: /var/lib/heketi/mounts/vg_9e07c4579599eac9bb5c97f637be0759/brick_d5e843ba70b23396dd4498aec95e0b09/brick
    Id:bbca5e50263de14c4f58ea553101a4f7   Name:/dev/mapper/1VENDOR_NFS_4_0_3885_2d256b78_f72f_49af_ba67_9667a406a204State:online    Size (GiB):99      Used (GiB):0       Free (GiB):99      
        Bricks:

Does this look good?

Just to recap: The currently suggested method for extending CNS with additional disks is by using the heketi-cli?

Thanks for your help and for taking the time

jarrpa commented 6 years ago

Yes, heketi-cli is the current recommended method.

Hmm... this seems somewhat cumbersome. Can you provide the exact commands you used to get it to work? Also can you say more on why this doesn't impact initial deployment, is it because device mapper isn't enabled initially thus the pre-existing devices aren't remapped?

ewerk-pnahratow commented 6 years ago

I used heketis device add changing the --name from /dev/sdd to the generated devicemapper name

heketipod="$(oc get pod -n glusterfs | grep heketi-storage | awk '{print $1}')"
heketikey="$(oc get deploymentconfigs/heketi-storage -n glusterfs --template='{{.spec.template.spec.containers}}' | grep -oP '(?<=\[name:HEKETI_ADMIN_KEY value:)\S+(?=\])')"
oc rsh -n=glusterfs "$heketipod" heketi-cli -s http://localhost:8080 --user admin --secret "$heketikey" device add --name=/dev/mapper/1VENDOR_NFS_4_0_3885_2d256b78_f72f_49af_ba67_9667a406a204 --node=0ecc0dcbf279ce5bceeaff1e026a3dd0

I did that three times in total since I have 3 nodes.

Also can you say more on why this doesn't impact initial deployment, is it because device mapper isn't enabled initially thus the pre-existing devices aren't remapped?

Yes thats exactly what I am talking about. My freshly installed hosts have only the /dev/sd*

jarrpa commented 6 years ago

Oh, okay, so the multipath -f thing was only to get the /dev/sd* naming to working again. Got it.

Hmm... this presents a problem. I'm not sure how to best get around this. Was this a disk that was already in the machine prior to OpenShift installation, or was it a disk that was added to the node after installation?

liveaverage commented 6 years ago

We actually had this same issue on a fresh OCP 3.7 deployment where automation was added outside of openshift-ansible to prepare each node hosting glusterfs by staging prereqs (including installation of multipathd + loading dm_multipath ) prior to first execution of cns-deploy topology load. We worked around the issue with a multipath -F and systemctl stop multipathd then reran the original cns-deploy. After successful execution we restarted multipathd.

ewerk-pnahratow commented 6 years ago

@jarrpa yes, those were just my debugging steps in order to understand the issue. The disks were introduced after the initial installation. I'm using VMs and just did a shutdown, add disk, boot for each of the VMs.

With @liveaverage 's method of completely flushing the multipath dms it should be possible to extend glusterfs using the playbook (haven't tried, though). For me this is a valid way.

If the playbook at some point wants to handle cns storage scaling "officially" this issue would have to be resolved in a clean way.

jarrpa commented 6 years ago

All right, thanks! Can the issue be closed, then?

ewerk-pnahratow commented 6 years ago

Yes. Thank you

ewerk-pnahratow commented 6 years ago

PR https://github.com/openshift/openshift-ansible/pull/7367 will solve the issue. The 3.7 merge is still pending (https://github.com/openshift/openshift-ansible/pull/8152)

openshift / openshift-ansible