Closed creiche closed 6 years ago
As you can see the playbook is looking for the heketi in the wrong name space
oc get pods --namespace app-storage
NAME READY STATUS RESTARTS AGE
glusterblock-storage-provisioner-dc-1-tdcjj 1/1 Running 0 14m
glusterfs-storage-7dfk2 1/1 Running 0 17m
glusterfs-storage-kjf2l 1/1 Running 0 17m
glusterfs-storage-t6c5q 1/1 Running 0 17m
heketi-storage-1-r2gn8 1/1 Running 0 14m
oc get pods --namespace infra-storage
NAME READY STATUS RESTARTS AGE
deploy-heketi-registry-1-v4246 1/1 Running 0 13m
glusterfs-registry-9xd2k 1/1 Running 0 14m
glusterfs-registry-fzkb8 1/1 Running 0 14m
glusterfs-registry-qc8dt 1/1 Running 0 14m
It appears that when the second run of the gluster role runs it does not reset the heketi_pod so when it hits
- name: Set heketi-cli command
set_fact:
glusterfs_heketi_client: "{% if glusterfs_heketi_is_native %}{{ openshift_client_binary }} --config={{ mktemp.stdout }}/admin.kubeconfig rsh --namespace={{ glusterfs_namespace }} {%if heketi_pod is defined %}{{ heketi_pod.metadata.name }}{% elif deploy_heketi_pod is defined %}{{ deploy_heketi_pod.metadata.name }}{% endif %} {% endif %}{{ glusterfs_heketi_cli }} -s http://{% if glusterfs_heketi_is_native %}localhost:8080{% else %}{{ glusterfs_heketi_url }}:{{ glusterfs_heketi_port }}{% endif %} --user admin {% if glusterfs_heketi_admin_key is defined %}--secret '{{ glusterfs_heketi_admin_key }}'{% endif %}"
It grabs the heketi_pod variable rather than the deploy_heketi_pod one.
I'm facing the same issue, with RedHat 7.5 and ansible 2.6.3.
Somehow the second CNS deployment (glusterfs_registry) selects the wrong pod (the one from the app-storage namespace) in the "Set heketi-cli command" Ansible task, even though in the heketi-cli command says "infra-storage".
But there is another hidden issue. In glusterfs_registry CNS deployment, deploymentconfig "deploy-heketi-registry" creates a pod "deploy-heketi-registry-1-xxxxx" but it does not finish.
Bellow follows the logs:
[root@xxxxxxx ~]# oc logs deploy-heketi-registry-1-rqtsz -n infra-storage stat: cannot stat '/var/lib/heketi/heketi.db': No such file or directory Heketi 6.0.0 [heketi] ERROR 2018/09/06 16:32:42 /src/github.com/heketi/heketi/apps/glusterfs/app.go:100: invalid log level: [heketi] INFO 2018/09/06 16:32:42 Loaded kubernetes executor [heketi] INFO 2018/09/06 16:32:42 Block: Auto Create Block Hosting Volume set to true [heketi] INFO 2018/09/06 16:32:42 Block: New Block Hosting Volume size 208 GB [heketi] INFO 2018/09/06 16:32:42 GlusterFS Application Loaded [heketi] INFO 2018/09/06 16:32:42 Started Node Health Cache Monitor Authorization loaded Listening on port 8080 [heketi] INFO 2018/09/06 16:32:52 Starting Node Health Status refresh [heketi] INFO 2018/09/06 16:32:52 Cleaned 0 nodes from health cache [heketi] INFO 2018/09/06 16:34:42 Starting Node Health Status refresh [heketi] INFO 2018/09/06 16:34:42 Cleaned 0 nodes from health cache ...
Yeah it looks like the second time it runs to create the infra-storage it sees the heketi-pod var so it never launches the piece that deploys heketi after it starts up the the deploy-heketi-pod.
I think I figured out the problem (or part of it). heketi_pod_check.yml runs the first CNS setup and sets variables "deploy_heketi_pod" and "heketi_pod". When the second CNS deployment runs and fails to deploy "deploy-heketi-registry" those variables are not being properly updated and because of that it uses this old "heketi_pod" when running heketi_load.yml the second time, causing this failure.
There are two problems here: (i) Second CNS DeploymentConfig "deploy-heketi-registry" fails. (ii) The second run of heketi_pod_check.yml does not clean the variables, masking the real problem.
I have no ideia yet what is causing (i).
Here is a workaround I came up with, not sure if it's the best fix but it got gluster installed for me.
diff --git a/roles/openshift_storage_glusterfs/tasks/heketi_load.yml b/roles/openshift_storage_glusterfs/tasks/heketi_load.yml
index 713f520..37083ec 100644
--- a/roles/openshift_storage_glusterfs/tasks/heketi_load.yml
+++ b/roles/openshift_storage_glusterfs/tasks/heketi_load.yml
@@ -1,7 +1,7 @@
---
- name: Set heketi-cli command
set_fact:
- glusterfs_heketi_client: "{% if glusterfs_heketi_is_native %}{{ openshift_client_binary }} --config={{ mktemp.stdout }}/admin.kubeconfig rsh --namespace={{ glusterfs_namespace }} {%if heketi_pod i
+ glusterfs_heketi_client: "{% if glusterfs_heketi_is_native %}{{ openshift_client_binary }} --config={{ mktemp.stdout }}/admin.kubeconfig rsh --namespace={{ glusterfs_namespace }} {% if ((heketi_po
- name: Verify heketi service
command: "{{ glusterfs_heketi_client }} cluster list"
@@ -13,7 +13,7 @@
dest: "{{ mktemp.stdout }}/topology.json"
- name: Place heketi topology on heketi Pod
- shell: "{{ openshift_client_binary }} --config={{ mktemp.stdout }}/admin.kubeconfig exec --namespace={{ glusterfs_namespace }} -i {%if heketi_pod is defined %}{{ heketi_pod.metadata.name }}{% elif d
+ shell: "{{ openshift_client_binary }} --config={{ mktemp.stdout }}/admin.kubeconfig exec --namespace={{ glusterfs_namespace }} -i {%if ((heketi_pod is defined) and (heketi_pod != \"\")) %}{{ heketi_
when:
- glusterfs_heketi_is_native
diff --git a/roles/openshift_storage_glusterfs/tasks/main.yml b/roles/openshift_storage_glusterfs/tasks/main.yml
index 8378f2b..7038e21 100644
--- a/roles/openshift_storage_glusterfs/tasks/main.yml
+++ b/roles/openshift_storage_glusterfs/tasks/main.yml
@@ -5,6 +5,9 @@
when:
- groups.glusterfs | default([]) | count > 0
+- set_fact:
+ heketi_pod: ""
+
- import_tasks: glusterfs_registry.yml
when: >
groups.glusterfs_registry | default([]) | count > 0
@jarrpa fyi
It did not worked for me. Indeed it solved the wrong variable issue from the first run. But now I'm with the root problem. For some reason, "deploy-heketi-registry-1-xxxxx" is still stuck in infra-storage namespace =/
Nevertheless, if you don't mind me saying so, I would put the following code:
- set_fact:
heketi_pod: ""
deploy_heketi_pod: ""
in roles\openshift_storage_glusterfs\tasks\heketi_pod_check.yml file, which is the one that creates these variables. Just to be a bit more clear why we are setting this to "". Not a big change...
@creiche, when you run
oc get pods --namespace app-storage
oc get pods --namespace infra-storage
do you get similar pods for both namespaces?
I got this:
[root@xxxxx ~]# oc get pods --namespace app-storage
NAME READY STATUS RESTARTS AGE
deploy-heketi-storage-1-9gk7s 1/1 Running 0 2h
glusterblock-storage-provisioner-dc-1-jmmcj 1/1 Running 0 32m
glusterfs-storage-bpnq9 1/1 Running 0 6h
glusterfs-storage-dm89l 1/1 Running 0 6h
glusterfs-storage-gvg56 1/1 Running 0 6h
heketi-storage-1-zwthc 1/1 Running 0 6h
[root@xxxxx~]# oc get pods --namespace infra-storage
NAME READY STATUS RESTARTS AGE
deploy-heketi-registry-1-cxsp4 1/1 Running 0 31m
glusterfs-registry-ntctz 1/1 Running 0 6h
glusterfs-registry-qs8p5 1/1 Running 0 6h
glusterfs-registry-z5qsh 1/1 Running 0 6h
Mine fully deployed. I am able to request storage via the gluster storage class now:
[root@xxxxx ~]# oc get pods --namespace app-storage
NAME READY STATUS RESTARTS AGE
glusterblock-storage-provisioner-dc-1-986qc 1/1 Running 0 6h
glusterfs-storage-6f4wb 1/1 Running 0 6h
glusterfs-storage-jgzxk 1/1 Running 0 6h
glusterfs-storage-nzps5 1/1 Running 0 6h
heketi-storage-1-wrgtg 1/1 Running 1 6h
[root@xxxxx ~]# oc get pods --namespace infra-storage
NAME READY STATUS RESTARTS AGE
glusterblock-registry-provisioner-dc-1-rfp5p 1/1 Running 0 6h
glusterfs-registry-bxgdc 1/1 Running 0 6h
glusterfs-registry-w64rl 1/1 Running 0 6h
glusterfs-registry-x8ptm 1/1 Running 0 6h
heketi-registry-1-p785k 1/1 Running 0 6h
[root@xxxxx ~]# oc get sc
NAME PROVISIONER AGE
glusterfs-registry-block gluster.org/glusterblock 6h
glusterfs-storage kubernetes.io/glusterfs 6h
glusterfs-storage-block gluster.org/glusterblock 1d
[root@xxxxx ~]# oc describe storageclass
Name: glusterfs-registry-block
IsDefaultClass: No
Annotations: <none>
Provisioner: gluster.org/glusterblock
Parameters: chapauthenabled=true,hacount=3,restsecretname=heketi-registry-admin-secret-block,restsecretnamespace=infra-storage,resturl=http://heketi-registry.infra-storage.svc:8080,restuser=admin
AllowVolumeExpansion: <unset>
MountOptions: <none>
ReclaimPolicy: Delete
VolumeBindingMode: Immediate
Events: <none>
Name: glusterfs-storage
IsDefaultClass: No
Annotations: <none>
Provisioner: kubernetes.io/glusterfs
Parameters: resturl=http://heketi-storage.app-storage.svc:8080,restuser=admin,secretName=heketi-storage-admin-secret,secretNamespace=app-storage
AllowVolumeExpansion: <unset>
MountOptions: <none>
ReclaimPolicy: Delete
VolumeBindingMode: Immediate
Events: <none>
Name: glusterfs-storage-block
IsDefaultClass: No
Annotations: <none>
Provisioner: gluster.org/glusterblock
Parameters: chapauthenabled=true,hacount=3,restsecretname=heketi-storage-admin-secret-block,restsecretnamespace=app-storage,resturl=http://heketi-storage.app-storage.svc:8080,restuser=admin
AllowVolumeExpansion: <unset>
MountOptions: <none>
ReclaimPolicy: Delete
VolumeBindingMode: Immediate
Events: <none>
[root@xxxxx ~]# oc describe pv
Name: registry-volume
Labels: <none>
Annotations: <none>
Finalizers: [kubernetes.io/pv-protection]
StorageClass:
Status: Released
Claim: default/registry-claim
Reclaim Policy: Retain
Access Modes: RWX
Capacity: 25Gi
Node Affinity: <none>
Message:
Source:
Type: Glusterfs (a Glusterfs mount on the host that shares a pod's lifetime)
EndpointsName: glusterfs-registry-endpoints
Path: glusterfs-registry-volume
ReadOnly: false
Events: <none>
@crmarques Just checking did you run
ansible-playbook -e "openshift_storage_glusterfs_wipe=true" playbooks/openshift-glusterfs/openshift-uninstall.yml
Before trying to rerun after fixing the variables?
No, I didn't! I really appreciate the tip!!!
Jumping in to chime that it's playbooks/openshift-glusterfs/uninstall.yml
, and that I'm working on a proper fix for this. :)
FYI I had this same issue with a fresh 3.10 install but when I updated the tasks modified in #9971 in my own tasks it worked perfectly. Thanks, @jarrpa !
Description
Unable to install GlusterFS using release-3.10 branch or the RPM install. Playbook references the Pod in app-storage namespace not infra-storage namespace and cannot find the pod so fails.
Version
Steps To Reproduce
Launch deploy_cluster.yml with glusterfs turned on Launch playbooks/openshift-gluster/config.yml
Expected Results
Expected a glusterFS deployment
Observed Results
Failure
Additional Information
Provide any additional information which may help us diagnose the issue.
$ cat /etc/redhat-release
)