gluster-block command not present in default glusterfs image (3.9)

lentzi90 commented 6 years ago

Description

The default (non-enterprise) docker image (gluster/gluster-centos) for glusterfs-storage does not include the gluster-block command needed for block storage. This results in failure to provision persistent volumes and a crash loop back off for the heketi pod when following this example.

Version

ansible 2.5.3
openshift-ansible-3.9.29-1-28-g6fd4487ea

Steps To Reproduce

Follow this guide to create an inventory file.
Set openshift_storage_glusterfs_storageclass_default=true instead of the ...block_storageclass_default=true to allow ansible service broker to start with a PVC.
Run prerequisites.yml and deploy_cluster.yml.

Change the default storage class to glusterfs-registry-block:

oc patch storageclass glusterfs-storage -p '{"metadata": {"annotations": {"storageclass.kubernetes.io/is-default-class": "false"}}}'
oc patch storageclass glusterfs-registry-block -p '{"metadata": {"annotations": {"storageclass.kubernetes.io/is-default-class": "true"}}}'

Deploy metrics and logging according to the guide.

Expected Results

Metrics and logging should deploy successfully with dynamically provisioned storage.

Observed Results

Provisioning of block volumes fail due to missing gluster-block command (from oc describe pvc <name-of-pvc>):

gluster.org/glusterblock glusterblock-registry-provisioner-dc-1-x6g2c 0b5f4f2b-5cee-11e8-a2d8-0a580a800004  Failed to provision volume with StorageClass "glusterfs-registry-block": failed to create volume: [heketi] failed to create volume: Unable to execute command on glusterfs-storage-gp9th: /bin/bash: gluster-block: command not found

Additionally, the heketi-storage pod fails (from oc logs <heketi-storage-pod>):

[heketi] ERROR 2018/05/21 12:42:16 /src/github.com/heketi/heketi/apps/glusterfs/app.go:150: Heketi terminated while performing one or more operations. Server will not start as long as pending operations are present in the db.
panic: Heketi terminated while performing one or more operations. Server will not start as long as pending operations are present in the db.

goroutine 1 [running]:
github.com/heketi/heketi/apps/glusterfs.NewApp(0x244f000, 0xc4204d2208, 0x0)
    /build/golang/src/github.com/heketi/heketi/apps/glusterfs/app.go:154 +0xc0e
main.main()
    /build/golang/src/github.com/heketi/heketi/main.go:273 +0x46d

Additional Information

I have successfully included the gluster-block command in the container by adding the following to the dockerfile here:

yum --setopt=tsflags=nodocs -y install gluster-block
systemctl enable gluster-blockd.service

Using this custom image results in a working deployment (image available here). Note #8398 if using separate images for openshift_storage_glusterfs_registry_image and openshift_storage_glusterfs_image.

Should the gluster-block binary be included in the default container image or should there be a separate image for this?

OS: Ubuntu 18.04

Inventory file:

master ansible_host=192.168.121.252
app1 ansible_host=192.168.121.135
app2 ansible_host=192.168.121.59
app3 ansible_host=192.168.121.85
gfs1 ansible_host=192.168.121.76
gfs2 ansible_host=192.168.121.249
gfs3 ansible_host=192.168.121.129

[OSEv3:children]
masters
nodes
etcd
glusterfs
glusterfs_registry

[OSEv3:vars]
openshift_deployment_type=origin
osm_cluster_network_cidr=10.128.0.0/14
openshift_portal_net=172.30.0.0/16
osm_host_subnet_length=9
# localhost likely doesn't meet the minimum requirements
openshift_disable_check=disk_availability,memory_availability

openshift_release=v3.9
ansible_port=22
ansible_user='vagrant'
ansible_become=yes
openshift_storage_glusterfs_block_host_vol_size=10

openshift_master_dynamic_provisioning_enabled=true

openshift_registry_selector="role=infra"
openshift_hosted_registry_storage_kind=glusterfs

openshift_storage_glusterfs_block_deploy=false
openshift_storage_glusterfs_storageclass=true
openshift_storage_glusterfs_storageclass_default=true

openshift_storage_glusterfs_registry_block_deploy=true
openshift_storage_glusterfs_registry_block_storageclass=true
openshift_storage_glusterfs_registry_block_storageclass_default=false

openshift_storageclass_default=false

# Add AFTER deployment for logging and metrics. Apply by running loggin/metrics
# playbooks.
# -----------------------------------------------------------------
# openshift_metrics_hawkular_nodeselector={"role":"infra"}
# openshift_metrics_cassandra_nodeselector={"role":"infra"}
# openshift_metrics_heapster_nodeselector={"role":"infra"}
# openshift_metrics_storage_kind=dynamic
#
# openshift_logging_es_nodeselector={"role":"infra"}
# openshift_logging_kibana_nodeselector={"role":"infra"}
# openshift_logging_curator_nodeselector={"role":"infra"}
# openshift_logging_es_pvc_size=1Gi
# openshift_logging_storage_kind=dynamic
# -----------------------------------------------------------------

[masters]
master

[etcd]
master

[nodes]
master openshift_schedulable=true openshift_node_labels="{'region': 'infra', 'zone': 'default'}"
app1 openshift_schedulable=true openshift_node_labels="{'role': 'app'}"
app2 openshift_schedulable=true openshift_node_labels="{'role': 'app'}"
app3 openshift_schedulable=true openshift_node_labels="{'role': 'app'}"
gfs1 openshift_schedulable=true openshift_node_labels="{'role': 'infra'}"
gfs2 openshift_schedulable=true openshift_node_labels="{'role': 'infra'}"
gfs3 openshift_schedulable=true openshift_node_labels="{'role': 'infra'}"

[glusterfs]
app1 glusterfs_devices='[ "/dev/vdb" ]'
app2 glusterfs_devices='[ "/dev/vdb" ]'
app3 glusterfs_devices='[ "/dev/vdb" ]'

[glusterfs_registry]
gfs1 glusterfs_devices='[ "/dev/vdb" ]'
gfs2 glusterfs_devices='[ "/dev/vdb" ]'
gfs3 glusterfs_devices='[ "/dev/vdb" ]'

DanyC97 commented 6 years ago

@lentzi90 can you run docker images so we can see the image id for the gluster/gluster-centos? If this is a prob is an upstream one but let's see ...

bitwurx commented 6 years ago

I have encountered this exact same issue. I am using the latest gluster-centos.

docker.io/gluster/gluster-centos                      latest              cdc02f14c0ae        2 months ago        372 MB

I can confirm that the gluster-block cli utility nor the gluster-blockd systemd unit is present.

sh-4.2# gluster-block
sh: gluster-block: command not found
sh-4.2# systemctl list-units | grep gluster
  etc-glusterfs.mount                                                                                    loaded active     mounted   /etc/glusterfs
  var-lib-glusterd.mount                                                                                 loaded active     mounted   /var/lib/glusterd
  var-lib-misc-glusterfsd.mount                                                                          loaded active     mounted   /var/lib/misc/glusterfsd
  var-log-glusterfs.mount                                                                                loaded active     mounted   /var/log/glusterfs
  glusterd.service                                                                                       loaded active     running   GlusterFS, a clustered file-system server
sh-4.2#

lentzi90 commented 6 years ago

Same image here:

REPOSITORY                                   TAG                 IMAGE ID            CREATED             SIZE
docker.io/openshift/origin-docker-registry   v3.9.0              9b472363b07a        6 days ago          465 MB
docker.io/openshift/origin-deployer          v3.9.0              e4de3cb64af9        6 days ago          1.26 GB
docker.io/openshift/origin-pod               v3.9.0              b6d2be1df9c0        6 days ago          220 MB
docker.io/heketi/heketi                      latest              4fee7ad83005        2 months ago        362 MB
docker.io/gluster/gluster-centos             latest              cdc02f14c0ae        2 months ago        372 MB

DanyC97 commented 6 years ago

so i just checked on my 3.7 cluster where i have the same gluster-centos image and yes there is no gluster-block however that is as expected since in my cluster i do have the openshift_storage_glusterfs_block_storageclass=True and

oc get po
NAME                                          READY     STATUS    RESTARTS   AGE
glusterblock-storage-provisioner-dc-1-864g5   1/1       Running   0          5d
glusterfs-storage-544jr                       1/1       Running   1          9d
glusterfs-storage-xlpzj                       1/1       Running   1          9d
glusterfs-storage-zk8hl                       1/1       Running   8          33d
heketi-storage-1-6btcw                        1/1       Running   0          16h

Note that i'm not deploying a registry based on gluster so ...

DanyC97 commented 6 years ago

maybe @jarrpa has any thoughts on it

bitwurx commented 6 years ago

My deploy isn't using the dedicated registry block storage (ie. glusterfs_registry) since I don't have enough devices.

I did a test with just glusterfs_registry storage to see if the installer would deploy a different version of the gluster container that included gluster-block and it still did not have the gluster-block command. The resources for glusterfs_registry deployed into the default project and the resources all had the glusterfsregistry* prefix but they were equivalent to the non registry resources.

If I had to guess I would pin this issue solely on the gluster container image as heketi and the gluster block provisioner seem to do the right thing.

lentzi90 commented 6 years ago

For reference: there is a pull request for this in the gluster/gluster-containers repo here. As soon as it can be merged, this issue should be solved.

DanyC97 commented 6 years ago

excellent, thank you for the info @lentzi90 !

jarrpa commented 6 years ago

Sorry for the delay, this fell off my radar.

I will start by saying that this issue was indeed valid: for the longest time, the upstream had no reasonable way to deploy gluster-block anywhere. @lentzi90 was right to look at the gluster-containers repo, but the particular PR linked is long-dead. Thankfully some recent changes allowed me to submit this PR which has since been merged.

This issue should now be resolved. :) Given how late I am to this party, if I hear no feedback otherwise within a week I'll close this issue.

DanyC97 commented 6 years ago

nice one @jarrpa ! +1 on closing it

openshift / openshift-ansible