openshift / openshift-ansible

Install and config an OpenShift 3.x cluster
https://try.openshift.com
Apache License 2.0
2.19k stars 2.31k forks source link

Docker registry on Glusterfs storage error #5308

Closed Asgoret closed 7 years ago

Asgoret commented 7 years ago

Description

Can`t deploy docker registry on glusterfs storage.

Version
ansible 2.3.1.0
  config file = /etc/ansible/ansible.cfg
  configured module search path = Default w/o overrides
  python version = 2.7.5 (default, Nov  6 2016, 00:28:07) [GCC 4.8.5 20150623 (Red Hat 4.8.5-11)]

git describe:
openshift-ansible-3.7.0-0.125.0-2-g3384369

uname -a:
Linux openshift-%name% 3.10.0-514.el7.x86_64 #1 SMP Tue Nov 22 16:42:41 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
Steps To Reproduce
  1. Edit /etc/ansible/inventory
    
    [OSEv3:children]
    masters
    nodes
    etcd
    glusterfs
    glusterfs_registry

[OSEv3:vars] ansible_ssh_user=root openshift_deployment_type=origin containerized=false osm_use_cockpit=true openshift_storage_glusterfs_is_native=False openshift_storage_glusterfs_heketi_url=10.5.135.185 <- master openshift_hosted_registry_storage_kind=glusterfs

[masters] openshift-master

[etcd] openshift-master

[nodes] openshift-master openshift_schedulable=false openshift-node1 openshift_node_labels="{'region': 'primary', 'zone': 'firstzone'}" openshift-node2 openshift_node_labels="{'region': 'primary', 'zone': 'secondzone'}" openshift-gluster1 openshift_schedulable=true openshift_node_labels="{'region': 'infra'}" openshift-gluster2 openshift_schedulable=true openshift_node_labels="{'region': 'infra'}" openshift-gluster3 openshift_schedulable=true openshift_node_labels="{'region': 'infra'}"

[glusterfs] openshift-gluster4 glusterfs_devices='[ "/dev/sdb" ]' openshift-gluster5 glusterfs_devices='[ "/dev/sdb" ]' openshift-gluster6 glusterfs_devices='[ "/dev/sdb" ]'

[glusterfs_registry] openshift-gluster1 glusterfs_devices='[ "/dev/sdb" ]' openshift-gluster2 glusterfs_devices='[ "/dev/sdb" ]' openshift-gluster3 glusterfs_devices='[ "/dev/sdb" ]'

2. From /etc/ansible run:

ansible-playbook -i ./inventory /opt/env/openshift-ansible/playbooks/byo/config.yml


##### Expected Results

Error:

TASK [openshift_hosted : Wait for registry pods] ** FAILED - RETRYING: Wait for registry pods (60 retries left). ... FAILED - RETRYING: Wait for registry pods (1 retries left). fatal: [openshift-master]: FAILED! => {"attempts": 60, "changed": false, "failed": true, "results": {"cmd": "/usr/bin/oc get pod --selector=docker-registry=default -o json -n default", "results": [{"apiVersion": "v1", "items": [{"apiVersion": "v1", "kind": "Pod", "metadata": {"annotations": {"kubernetes.io/created-by": "{\"kind\":\"SerializedReference\",\"apiVersion\":\"v1\",\"reference\":{\"kind\":\"ReplicationController\",\"namespace\":\"default\",\"name\":\"docker-registry-1\",\"uid\":\"1a224e3d-8da1-11e7-9026-00505693371a\",\"apiVersion\":\"v1\",\"resourceVersion\":\"1867\"}}\n", "openshift.io/deployment-config.latest-version": "1", "openshift.io/deployment-config.name": "docker-registry", "openshift.io/deployment.name": "docker-registry-1", "openshift.io/scc": "hostnetwork"}, "creationTimestamp": "2017-08-30T16:35:40Z", "generateName": "docker-registry-1-", "labels": {"deployment": "docker-registry-1", "deploymentconfig": "docker-registry", "docker-registry": "default"}, "name": "docker-registry-1-9pks4", "namespace": "default", "ownerReferences": [{"apiVersion": "v1", "blockOwnerDeletion": true, "controller": true, "kind": "ReplicationController", "name": "docker-registry-1", "uid": "1a224e3d-8da1-11e7-9026-00505693371a"}], "resourceVersion": "1879", "selfLink": "/api/v1/namespaces/default/pods/docker-registry-1-9pks4", "uid": "42930ff7-8da1-11e7-9026-00505693371a"}, "spec": {"containers": [{"env": [{"name": "REGISTRY_HTTP_ADDR", "value": ":5000"}, {"name": "REGISTRY_HTTP_NET", "value": "tcp"}, {"name": "REGISTRY_HTTP_SECRET", "value": "BGzdoN8TjdXyko7FZJBQAWZ7lYeBKDYfyJOBhHhCkhs="}, {"name": "REGISTRY_MIDDLEWARE_REPOSITORY_OPENSHIFT_ENFORCEQUOTA", "value": "false"}, {"name": "OPENSHIFT_DEFAULT_REGISTRY", "value": "docker-registry.default.svc:5000"}, {"name": "REGISTRY_HTTP_TLS_KEY", "value": "/etc/secrets/registry.key"}, {"name": "REGISTRY_HTTP_TLS_CERTIFICATE", "value": "/etc/secrets/registry.crt"}], "image": "openshift/origin-docker-registry:v3.6.0", "imagePullPolicy": "IfNotPresent", "livenessProbe": {"failureThreshold": 3, "httpGet": {"path": "/healthz", "port": 5000, "scheme": "HTTPS"}, "initialDelaySeconds": 10, "periodSeconds": 10, "successThreshold": 1, "timeoutSeconds": 5}, "name": "registry", "ports": [{"containerPort": 5000, "protocol": "TCP"}], "readinessProbe": {"failureThreshold": 3, "httpGet": {"path": "/healthz", "port": 5000, "scheme": "HTTPS"}, "periodSeconds": 10, "successThreshold": 1, "timeoutSeconds": 5}, "resources": {"requests": {"cpu": "100m", "memory": "256Mi"}}, "securityContext": {"capabilities": {"drop": ["KILL", "MKNOD", "SETGID", "SETUID", "SYS_CHROOT"]}, "privileged": false, "runAsUser": 1000030000, "seLinuxOptions": {"level": "s0:c6,c0"}}, "terminationMessagePath": "/dev/termination-log", "terminationMessagePolicy": "File", "volumeMounts": [{"mountPath": "/registry", "name": "registry-storage"}, {"mountPath": "/etc/secrets", "name": "registry-certificates"}, {"mountPath": "/var/run/secrets/kubernetes.io/serviceaccount", "name": "registry-token-j83qn", "readOnly": true}]}], "dnsPolicy": "ClusterFirst", "imagePullSecrets": [{"name": "registry-dockercfg-jpnq9"}], "nodeName": "openshift-gluster2", "nodeSelector": {"region": "infra"}, "restartPolicy": "Always", "schedulerName": "default-scheduler", "securityContext": {"fsGroup": 1000030000, "seLinuxOptions": {"level": "s0:c6,c0"}, "supplementalGroups": [1000030000]}, "serviceAccount": "registry", "serviceAccountName": "registry", "terminationGracePeriodSeconds": 30, "volumes": [{"name": "registry-storage", "persistentVolumeClaim": {"claimName": "registry-claim"}}, {"name": "registry-certificates", "secret": {"defaultMode": 420, "secretName": "registry-certificates"}}, {"name": "registry-token-j83qn", "secret": {"defaultMode": 420, "secretName": "registry-token-j83qn"}}]}, "status": {"conditions": [{"lastProbeTime": null, "lastTransitionTime": "2017-09-05T14:27:37Z", "status": "True", "type": "Initialized"}, {"lastProbeTime": null, "lastTransitionTime": "2017-09-05T14:27:37Z", "message": "containers with unready status: [registry]", "reason": "ContainersNotReady", "status": "False", "type": "Ready"}, {"lastProbeTime": null, "lastTransitionTime": "2017-08-30T16:35:40Z", "status": "True", "type": "PodScheduled"}], "containerStatuses": [{"image": "openshift/origin-docker-registry:v3.6.0", "imageID": "", "lastState": {}, "name": "registry", "ready": false, "restartCount": 0, "state": {"waiting": {"reason": "ContainerCreating"}}}], "hostIP": "10.5.135.170", "phase": "Pending", "qosClass": "Burstable", "startTime": "2017-09-05T14:27:37Z"}}, {"apiVersion": "v1", "kind": "Pod", "metadata": {"annotations": {"kubernetes.io/created-by": "{\"kind\":\"SerializedReference\",\"apiVersion\":\"v1\",\"reference\":{\"kind\":\"ReplicationController\",\"namespace\":\"default\",\"name\":\"docker-registry-1\",\"uid\":\"1a224e3d-8da1-11e7-9026-00505693371a\",\"apiVersion\":\"v1\",\"resourceVersion\":\"1867\"}}\n", "openshift.io/deployment-config.latest-version": "1", "openshift.io/deployment-config.name": "docker-registry", "openshift.io/deployment.name": "docker-registry-1", "openshift.io/scc": "hostnetwork"}, "creationTimestamp": "2017-08-30T16:35:40Z", "generateName": "docker-registry-1-", "labels": {"deployment": "docker-registry-1", "deploymentconfig": "docker-registry", "docker-registry": "default"}, "name": "docker-registry-1-ppzqk", "namespace": "default", "ownerReferences": [{"apiVersion": "v1", "blockOwnerDeletion": true, "controller": true, "kind": "ReplicationController", "name": "docker-registry-1", "uid": "1a224e3d-8da1-11e7-9026-00505693371a"}], "resourceVersion": "1881", "selfLink": "/api/v1/namespaces/default/pods/docker-registry-1-ppzqk", "uid": "42930c52-8da1-11e7-9026-00505693371a"}, "spec": {"containers": [{"env": [{"name": "REGISTRY_HTTP_ADDR", "value": ":5000"}, {"name": "REGISTRY_HTTP_NET", "value": "tcp"}, {"name": "REGISTRY_HTTP_SECRET", "value": "BGzdoN8TjdXyko7FZJBQAWZ7lYeBKDYfyJOBhHhCkhs="}, {"name": "REGISTRY_MIDDLEWARE_REPOSITORY_OPENSHIFT_ENFORCEQUOTA", "value": "false"}, {"name": "OPENSHIFT_DEFAULT_REGISTRY", "value": "docker-registry.default.svc:5000"}, {"name": "REGISTRY_HTTP_TLS_KEY", "value": "/etc/secrets/registry.key"}, {"name": "REGISTRY_HTTP_TLS_CERTIFICATE", "value": "/etc/secrets/registry.crt"}], "image": "openshift/origin-docker-registry:v3.6.0", "imagePullPolicy": "IfNotPresent", "livenessProbe": {"failureThreshold": 3, "httpGet": {"path": "/healthz", "port": 5000, "scheme": "HTTPS"}, "initialDelaySeconds": 10, "periodSeconds": 10, "successThreshold": 1, "timeoutSeconds": 5}, "name": "registry", "ports": [{"containerPort": 5000, "protocol": "TCP"}], "readinessProbe": {"failureThreshold": 3, "httpGet": {"path": "/healthz", "port": 5000, "scheme": "HTTPS"}, "periodSeconds": 10, "successThreshold": 1, "timeoutSeconds": 5}, "resources": {"requests": {"cpu": "100m", "memory": "256Mi"}}, "securityContext": {"capabilities": {"drop": ["KILL", "MKNOD", "SETGID", "SETUID", "SYS_CHROOT"]}, "privileged": false, "runAsUser": 1000030000, "seLinuxOptions": {"level": "s0:c6,c0"}}, "terminationMessagePath": "/dev/termination-log", "terminationMessagePolicy": "File", "volumeMounts": [{"mountPath": "/registry", "name": "registry-storage"}, {"mountPath": "/etc/secrets", "name": "registry-certificates"}, {"mountPath": "/var/run/secrets/kubernetes.io/serviceaccount", "name": "registry-token-j83qn", "readOnly": true}]}], "dnsPolicy": "ClusterFirst", "imagePullSecrets": [{"name": "registry-dockercfg-jpnq9"}], "nodeName": "openshift-gluster3", "nodeSelector": {"region": "infra"}, "restartPolicy": "Always", "schedulerName": "default-scheduler", "securityContext": {"fsGroup": 1000030000, "seLinuxOptions": {"level": "s0:c6,c0"}, "supplementalGroups": [1000030000]}, "serviceAccount": "registry", "serviceAccountName": "registry", "terminationGracePeriodSeconds": 30, "volumes": [{"name": "registry-storage", "persistentVolumeClaim": {"claimName": "registry-claim"}}, {"name": "registry-certificates", "secret": {"defaultMode": 420, "secretName": "registry-certificates"}}, {"name": "registry-token-j83qn", "secret": {"defaultMode": 420, "secretName": "registry-token-j83qn"}}]}, "status": {"conditions": [{"lastProbeTime": null, "lastTransitionTime": "2017-09-05T14:29:59Z", "status": "True", "type": "Initialized"}, {"lastProbeTime": null, "lastTransitionTime": "2017-09-05T14:29:59Z", "message": "containers with unready status: [registry]", "reason": "ContainersNotReady", "status": "False", "type": "Ready"}, {"lastProbeTime": null, "lastTransitionTime": "2017-08-30T16:35:40Z", "status": "True", "type": "PodScheduled"}], "containerStatuses": [{"image": "openshift/origin-docker-registry:v3.6.0", "imageID": "", "lastState": {}, "name": "registry", "ready": false, "restartCount": 0, "state": {"waiting": {"reason": "ContainerCreating"}}}], "hostIP": "10.5.135.169", "phase": "Pending", "qosClass": "Burstable", "startTime": "2017-09-05T14:29:59Z"}}, {"apiVersion": "v1", "kind": "Pod", "metadata": {"annotations": {"kubernetes.io/created-by": "{\"kind\":\"SerializedReference\",\"apiVersion\":\"v1\",\"reference\":{\"kind\":\"ReplicationController\",\"namespace\":\"default\",\"name\":\"docker-registry-1\",\"uid\":\"1a224e3d-8da1-11e7-9026-00505693371a\",\"apiVersion\":\"v1\",\"resourceVersion\":\"1867\"}}\n", "openshift.io/deployment-config.latest-version": "1", "openshift.io/deployment-config.name": "docker-registry", "openshift.io/deployment.name": "docker-registry-1", "openshift.io/scc": "hostnetwork"}, "creationTimestamp": "2017-08-30T16:35:40Z", "generateName": "docker-registry-1-", "labels": {"deployment": "docker-registry-1", "deploymentconfig": "docker-registry", "docker-registry": "default"}, "name": "docker-registry-1-vtf5r", "namespace": "default", "ownerReferences": [{"apiVersion": "v1", "blockOwnerDeletion": true, "controller": true, "kind": "ReplicationController", "name": "docker-registry-1", "uid": "1a224e3d-8da1-11e7-9026-00505693371a"}], "resourceVersion": "1877", "selfLink": "/api/v1/namespaces/default/pods/docker-registry-1-vtf5r", "uid": "4292f440-8da1-11e7-9026-00505693371a"}, "spec": {"containers": [{"env": [{"name": "REGISTRY_HTTP_ADDR", "value": ":5000"}, {"name": "REGISTRY_HTTP_NET", "value": "tcp"}, {"name": "REGISTRY_HTTP_SECRET", "value": "BGzdoN8TjdXyko7FZJBQAWZ7lYeBKDYfyJOBhHhCkhs="}, {"name": "REGISTRY_MIDDLEWARE_REPOSITORY_OPENSHIFT_ENFORCEQUOTA", "value": "false"}, {"name": "OPENSHIFT_DEFAULT_REGISTRY", "value": "docker-registry.default.svc:5000"}, {"name": "REGISTRY_HTTP_TLS_KEY", "value": "/etc/secrets/registry.key"}, {"name": "REGISTRY_HTTP_TLS_CERTIFICATE", "value": "/etc/secrets/registry.crt"}], "image": "openshift/origin-docker-registry:v3.6.0", "imagePullPolicy": "IfNotPresent", "livenessProbe": {"failureThreshold": 3, "httpGet": {"path": "/healthz", "port": 5000, "scheme": "HTTPS"}, "initialDelaySeconds": 10, "periodSeconds": 10, "successThreshold": 1, "timeoutSeconds": 5}, "name": "registry", "ports": [{"containerPort": 5000, "protocol": "TCP"}], "readinessProbe": {"failureThreshold": 3, "httpGet": {"path": "/healthz", "port": 5000, "scheme": "HTTPS"}, "periodSeconds": 10, "successThreshold": 1, "timeoutSeconds": 5}, "resources": {"requests": {"cpu": "100m", "memory": "256Mi"}}, "securityContext": {"capabilities": {"drop": ["KILL", "MKNOD", "SETGID", "SETUID", "SYS_CHROOT"]}, "privileged": false, "runAsUser": 1000030000, "seLinuxOptions": {"level": "s0:c6,c0"}}, "terminationMessagePath": "/dev/termination-log", "terminationMessagePolicy": "File", "volumeMounts": [{"mountPath": "/registry", "name": "registry-storage"}, {"mountPath": "/etc/secrets", "name": "registry-certificates"}, {"mountPath": "/var/run/secrets/kubernetes.io/serviceaccount", "name": "registry-token-j83qn", "readOnly": true}]}], "dnsPolicy": "ClusterFirst", "imagePullSecrets": [{"name": "registry-dockercfg-jpnq9"}], "nodeName": "openshift-gluster1", "nodeSelector": {"region": "infra"}, "restartPolicy": "Always", "schedulerName": "default-scheduler", "securityContext": {"fsGroup": 1000030000, "seLinuxOptions": {"level": "s0:c6,c0"}, "supplementalGroups": [1000030000]}, "serviceAccount": "registry", "serviceAccountName": "registry", "terminationGracePeriodSeconds": 30, "volumes": [{"name": "registry-storage", "persistentVolumeClaim": {"claimName": "registry-claim"}}, {"name": "registry-certificates", "secret": {"defaultMode": 420, "secretName": "registry-certificates"}}, {"name": "registry-token-j83qn", "secret": {"defaultMode": 420, "secretName": "registry-token-j83qn"}}]}, "status": {"conditions": [{"lastProbeTime": null, "lastTransitionTime": "2017-09-05T14:25:11Z", "status": "True", "type": "Initialized"}, {"lastProbeTime": null, "lastTransitionTime": "2017-09-05T14:25:11Z", "message": "containers with unready status: [registry]", "reason": "ContainersNotReady", "status": "False", "type": "Ready"}, {"lastProbeTime": null, "lastTransitionTime": "2017-08-30T16:35:40Z", "status": "True", "type": "PodScheduled"}], "containerStatuses": [{"image": "openshift/origin-docker-registry:v3.6.0", "imageID": "", "lastState": {}, "name": "registry", "ready": false, "restartCount": 0, "state": {"waiting": {"reason": "ContainerCreating"}}}], "hostIP": "10.5.135.171", "phase": "Pending", "qosClass": "Burstable", "startTime": "2017-09-05T14:25:11Z"}}], "kind": "List", "metadata": {}, "resourceVersion": "", "selfLink": ""}], "returncode": 0}, "state": "list"} to retry, use: --limit @/opt/env/openshift-ansible/playbooks/byo/config.retry

PLAY RECAP **** localhost : ok=13 changed=0 unreachable=0 failed=0
openshift-gluster1 : ok=158 changed=58 unreachable=0 failed=0
openshift-gluster2 : ok=158 changed=58 unreachable=0 failed=0
openshift-gluster3 : ok=158 changed=58 unreachable=0 failed=0
openshift-gluster4 : ok=35 changed=6 unreachable=0 failed=0
openshift-gluster5 : ok=35 changed=6 unreachable=0 failed=0
openshift-gluster6 : ok=35 changed=6 unreachable=0 failed=0
openshift-master : ok=518 changed=192 unreachable=0 failed=1
openshift-node1 : ok=160 changed=61 unreachable=0 failed=0
openshift-node2 : ok=160 changed=61 unreachable=0 failed=0

Failure summary:

1. Hosts:    openshift-master
   Play:     Create Hosted Resources
   Task:     Wait for registry pods
   Message:  Failed without returning a message.

Additional information:
Host:openshift-gluster1-6
Command: gluster volume list
Result:

No volumes present in cluster

Host: openshift-gluster1-3 (Registry)
Command: gluster peer status

Result: Number of Peers: 2

Hostname: openshift-gluster2 Uuid: 7693ed1f-1074-4529-805f-8c96fac44cf6 State: Peer in Cluster (Connected)

Hostname: openshift-gluster3 Uuid: 70481524-e2c2-49e9-9b4f-7199086fd21c State: Peer in Cluster (Connected)

Host: openshift-gluster4-6 (storage)
Command: gluster peer status
Result:
```Number of Peers: 2

Hostname: openshift-gluster5
Uuid: f7965f67-35e5-40c6-91b1-a620e462f4b7
State: Peer in Cluster (Connected)

Hostname: openshift-gluster6
Uuid: 930042d4-310f-402c-96d8-f87fad6bf07b
State: Peer in Cluster (Connected)

Host: openshift-master Command: oc get storageclass Result:

NAME                TYPE
glusterfs-storage   kubernetes.io/glusterfs 

Command: oc get nodes Result:

NAME                 STATUS                     AGE       VERSION
openshift-gluster1   Ready                      44m       v1.6.1+5115d708d7
openshift-gluster2   Ready                      44m       v1.6.1+5115d708d7
openshift-gluster3   Ready                      44m       v1.6.1+5115d708d7
openshift-master     Ready,SchedulingDisabled   44m       v1.6.1+5115d708d7
openshift-node1      Ready                      44m       v1.6.1+5115d708d7
openshift-node2      Ready                      44m       v1.6.1+5115d708d7

Command: oc get pods Result:

NAME                       READY     STATUS    RESTARTS   AGE
docker-registry-1-deploy   0/1       Error     0          40m
router-1-4vt0w             1/1       Running   0          40m
router-1-9nf2f             1/1       Running   0          40m
router-1-ddgdx             1/1       Running   0          40m
jarrpa commented 7 years ago

@Asgoret Okay, so the volume doesn't exist. Can you run gluster volume info from one of the nodes on the other cluster? If the volume is there, then we haven't created the volume on the right cluster yet.

Do heketi-cli topology info to find the correct clusterID (double-check that this is the one with the reigstry-hosting nodes!!) then do:

heketi-cli -s http://10.5.135.185:8080 --user admin  volume create --size=5 --name=glusterfs-registry-volume --clusters=<CLUSTER_ID>
Asgoret commented 7 years ago

@jarrpa Gluster storage cluster:

[root@openshift-gluster4 ~]# gluster volume info
No volumes present

[root@openshift-gluster5 ~]# gluster volume info
No volumes present

[root@openshift-gluster6 ~]# gluster volume info
No volumes present

I can`t create volume twice.

[root@openshift-master ~]# heketi-cli -s http://10.5.135.185:8080 --user admin  volume create --size=5 --name=glusterfs-registry-volume --clusters=7c4dd1c18b6e9a357404bf86e5c442a5
Error: Name glusterfs-registry-volume is already in use in all available clusters

BUT:

[root@openshift-master ~]# heketi-cli topology info

Cluster Id: 20302085243253aa6554cd1a644d4c66

    Volumes:

    Nodes:

Cluster Id: 7c4dd1c18b6e9a357404bf86e5c442a5

    Volumes:

    Name: glusterfs-registry-volume
    Size: 5
    Id: 880a60fa56f80b05be3c63316d7ec21f
    Cluster Id: 7c4dd1c18b6e9a357404bf86e5c442a5
    Mount: 10.5.135.169:glusterfs-registry-volume
    Mount Options: backup-volfile-servers=10.5.135.170,10.5.135.171
    Durability Type: replicate
    Replica: 3
    Snapshot: Disabled

    Nodes:

    Node Id: 5a5905a448514c28d3ae4bc200b823df
    Cluster Id: 7c4dd1c18b6e9a357404bf86e5c442a5
    Management Hostname: openshift-gluster2
    Storage Hostname: 10.5.135.170
    Devices:
        Id:da834ba9bc7e1b1d4c991ccbc9afa06d   Name:/dev/sdb            State:online    Size (GiB):500     Used (GiB):5       Free (GiB):494     
            Bricks:
                Id:6a4f2491e39d88bcda2db231d96438ae   Size (GiB):5       Path: /mockpath

    Node Id: 9add049d9f0258a019479503c15756e3
    Cluster Id: 7c4dd1c18b6e9a357404bf86e5c442a5
    Management Hostname: openshift-gluster1
    Storage Hostname: 10.5.135.171
    Devices:
        Id:8f6ca00cf62f5e2a39079aedcb4d40a5   Name:/dev/sdb            State:online    Size (GiB):500     Used (GiB):5       Free (GiB):494     
            Bricks:
                Id:0d38f7be0ed97c8f0cdd9ce80c510973   Size (GiB):5       Path: /mockpath

    Node Id: d1aad8533a6f40653d7a222e3b7ee691
    Cluster Id: 7c4dd1c18b6e9a357404bf86e5c442a5
    Management Hostname: openshift-gluster3
    Storage Hostname: 10.5.135.169
    Devices:
        Id:cb372af155ec1c001d3d6a1470c95e32   Name:/dev/sdb            State:online    Size (GiB):500     Used (GiB):5       Free (GiB):494     
            Bricks:
                Id:2dd888e4443d7d50afeb1724f78194a8   Size (GiB):5       Path: /mockpath

Re-create glusterfs-registry-volume?

jarrpa commented 7 years ago

@Asgoret ...if the volume is showing up in heketi CLI but not in the gluster CLI, something is wrong... yes, delete the existing volume and recreate it.

Asgoret commented 7 years ago

@jarrpa

[root@openshift-master ~]# heketi-cli -s http://10.5.135.185:8080 --user admin  volume create --size=5 --name=glusterfs-registry-volume --clusters=7c4dd1c18b6e9a357404bf86e5c442a5
Name: glusterfs-registry-volume
Size: 5
Volume Id: 195cb4820f66d6435fbe5afa71e930d8
Cluster Id: 7c4dd1c18b6e9a357404bf86e5c442a5
Mount: 10.5.135.169:glusterfs-registry-volume
Mount Options: backup-volfile-servers=10.5.135.170,10.5.135.171
Durability Type: replicate
Distributed+Replica: 3

[root@openshift-master ~]# heketi-cli topology info

Cluster Id: 20302085243253aa6554cd1a644d4c66

    Volumes:

    Nodes:

Cluster Id: 7c4dd1c18b6e9a357404bf86e5c442a5

    Volumes:

    Name: glusterfs-registry-volume
    Size: 5
    Id: 195cb4820f66d6435fbe5afa71e930d8
    Cluster Id: 7c4dd1c18b6e9a357404bf86e5c442a5
    Mount: 10.5.135.169:glusterfs-registry-volume
    Mount Options: backup-volfile-servers=10.5.135.170,10.5.135.171
    Durability Type: replicate
    Replica: 3
    Snapshot: Disabled

    Nodes:

    Node Id: 5a5905a448514c28d3ae4bc200b823df
    Cluster Id: 7c4dd1c18b6e9a357404bf86e5c442a5
    Management Hostname: openshift-gluster2
    Storage Hostname: 10.5.135.170
    Devices:
        Id:da834ba9bc7e1b1d4c991ccbc9afa06d   Name:/dev/sdb            State:online    Size (GiB):500     Used (GiB):5       Free (GiB):494     
            Bricks:
                Id:8351c1a466c0995b754ec0f57d6fd70d   Size (GiB):5       Path: /mockpath

    Node Id: 9add049d9f0258a019479503c15756e3
    Cluster Id: 7c4dd1c18b6e9a357404bf86e5c442a5
    Management Hostname: openshift-gluster1
    Storage Hostname: 10.5.135.171
    Devices:
        Id:8f6ca00cf62f5e2a39079aedcb4d40a5   Name:/dev/sdb            State:online    Size (GiB):500     Used (GiB):5       Free (GiB):494     
            Bricks:
                Id:d9d3f01143fe57c87f037ecc4da80ae8   Size (GiB):5       Path: /mockpath

    Node Id: d1aad8533a6f40653d7a222e3b7ee691
    Cluster Id: 7c4dd1c18b6e9a357404bf86e5c442a5
    Management Hostname: openshift-gluster3
    Storage Hostname: 10.5.135.169
    Devices:
        Id:cb372af155ec1c001d3d6a1470c95e32   Name:/dev/sdb            State:online    Size (GiB):500     Used (GiB):5       Free (GiB):494     
            Bricks:
              Id:b3acc3e99df489642185a893ed3a236f   Size (GiB):5       Path: /mockpath

But cluster volume on gluster clusters is empty:

[root@openshift-gluster1 ~]# gluster volume info
No volumes present

Second cluster:

[root@openshift-gluster4 ~]# gluster volume info
No volumes present
jarrpa commented 7 years ago

@Asgoret Something is definitely wrong... I also just noticed that your other cluster has no nodes defined.

We should start over with your GlusterFS setup. Are you able to destroy the current nodes and recreate one using heketi only for one cluster?

Asgoret commented 7 years ago

@jarrpa Yes. It`s all VM with base snapshot with install requirement components.

Asgoret commented 7 years ago

@jarrpa Looks loke i found issue. Ansible palybook doesn`t modify heketi.json?

UPD: If this task end skipped because heketi is installed.

TASK [openshift_storage_glusterfs : Make sure heketi-client is installed]
TASK [openshift_storage_glusterfs : Verify heketi-cli is installed]
jarrpa commented 7 years ago

@Asgoret Hmm... I think you're right, that heketi.json is not being modified. Can you be more specific about what you think the issue is?

Asgoret commented 7 years ago

@jarrpa When you install heketi like a package of your system you use yum or apt. So heketi service create an heketi.json in /etc/heketi. "heketi.json" is a configuration file of heketi service. Like inventory for ansible.

When ansible install components of openshift its used ssh with root login, Heketi work same way. When we create gluster-docker-volume its created on master, where heketi was installed. But on gluster nodes were empty. I think heketi cant connect to nodes for create volume, but dont show any errors.

Path of config for ssh:

"_sshexec_comment": "SSH username and private key file information",
    "sshexec": {
      "keyfile": "path/to/private_key",
      "user": "sshuser",
      "port": "Optional: ssh port.  Default is 22",
      "fstab": "Optional: Specify fstab file on node.  Default is /etc/fstab"

And path for kubernetes:

    "_kubeexec_comment": "Kubernetes configuration",
    "kubeexec": {
      "host" :"https://kubernetes.host:8443",
      "cert" : "/path/to/crt.file",
      "insecure": false,
      "user": "kubernetes username",
      "password": "password for kubernetes user",
      "namespace": "OpenShift project or Kubernetes namespace",
      "fstab": "Optional: Specify fstab file on node.  Default is /etc/fstab"
    },

Full config. heketi.txt

So, i deleted heketi from master node and re-run install. Installation failed on task "Make sure heketi-client is installed".

UPD:If install only heketi-client error is on task: TASK [openshift_storage_glusterfs : Verify heketi service]

UPD#1:I find task that set fact for glusterfs_storage, but can`t find same task for docker-storage.

TASK [openshift_storage_glusterfs : set_fact] 
task path: /opt/env/openshift-ansible/roles/openshift_storage_glusterfs/tasks/glusterfs_config.yml:2
ok: [openshift-master] => {
    "ansible_facts": {
        "glusterfs_heketi_cli": "heketi-cli", 
        "glusterfs_heketi_deploy_is_missing": true, 
        "glusterfs_heketi_executor": "kubernetes", 
        "glusterfs_heketi_image": "heketi/heketi", 
        "glusterfs_heketi_is_missing": true, 
        "glusterfs_heketi_is_native": false, 
        "glusterfs_heketi_port": 8080, 
        "glusterfs_heketi_ssh_port": 22, 
        "glusterfs_heketi_ssh_sudo": false, 
        "glusterfs_heketi_ssh_user": "root", 
        "glusterfs_heketi_topology_load": true, 
        "glusterfs_heketi_url": "10.5.135.185", 
        "glusterfs_heketi_version": "latest", 
        "glusterfs_heketi_wipe": false, 
        "glusterfs_image": "gluster/gluster-centos", 
        "glusterfs_is_native": false, 
        "glusterfs_name": "storage", 
        "glusterfs_namespace": "default", 
        "glusterfs_nodes": [
            "openshift-gluster4", 
            "openshift-gluster5", 
            "openshift-gluster6"
        ], 
        "glusterfs_nodeselector": {
            "glusterfs": "storage-host"
        }, 
        "glusterfs_storageclass": true, 
        "glusterfs_timeout": 300, 
        "glusterfs_version": "latest", 
        "glusterfs_wipe": false
    }, 
    "changed": false
}
jarrpa commented 7 years ago

@Asgoret You're losing me a little bit in the details, I think our English isn't quite the same, but I'll try my best. :)

It looks like you're jumping around a bit, so I'll try to keep up:

Asgoret commented 7 years ago

@jarrpa I apologize for my bad english and for the long response :( I have good news. Part of error was in incorrect configuration of heketi server. Now heketi create registry-volumes on right glusterfs nodes, but docker-registry don`t start.

[root@openshift-gluster3 ~]# gluster volume info

Volume Name: glusterfs-registry-volume
Type: Replicate
Volume ID: aa634728-d12c-4601-9531-7ab50f936b22
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.5.135.170:/var/lib/heketi/mounts/vg_84c172368ee1e18a9a63dfde4eecedff/brick_fadd878e46234ae445bf56eeb440f7b0/brick
Brick2: 10.5.135.171:/var/lib/heketi/mounts/vg_e8ca099f2716fd2da6039b4e073ede88/brick_74c679884e81650af9f258bf842e065d/brick
Brick3: 10.5.135.169:/var/lib/heketi/mounts/vg_5905745dedc05f0d90bbe992da4e75ca/brick_0a15cd29b5443edbeb7b9a40e533da72/brick
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
Asgoret commented 7 years ago

@jarrpa

[root@openshift-master heketi]# oc describe  po/docker-registry-1-4vqnk
Name:           docker-registry-1-4vqnk
Namespace:      default
Security Policy:    hostnetwork
Node:           openshift-gluster1/10.5.135.171
Start Time:     Tue, 05 Sep 2017 11:00:25 -0400
Labels:         deployment=docker-registry-1
            deploymentconfig=docker-registry
            docker-registry=default
Annotations:        kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"default","name":"docker-registry-1","uid":"aed2f8ac-8da5-11e7-bb1c-005...
            openshift.io/deployment-config.latest-version=1
            openshift.io/deployment-config.name=docker-registry
            openshift.io/deployment.name=docker-registry-1
            openshift.io/scc=hostnetwork
Status:         Pending
IP:         
Controllers:        ReplicationController/docker-registry-1
Containers:
  registry:
    Container ID:   
    Image:      openshift/origin-docker-registry:v3.6.0
    Image ID:       
    Port:       5000/TCP
    State:      Waiting
      Reason:       ContainerCreating
    Ready:      False
    Restart Count:  0
    Requests:
      cpu:  100m
      memory:   256Mi
    Liveness:   http-get https://:5000/healthz delay=10s timeout=5s period=10s #success=1 #failure=3
    Readiness:  http-get https://:5000/healthz delay=0s timeout=5s period=10s #success=1 #failure=3
    Environment:
      REGISTRY_HTTP_ADDR:                   :5000
      REGISTRY_HTTP_NET:                    tcp
      REGISTRY_HTTP_SECRET:                 igQvU/aGJLy2230+Vmj/u+YurXlqmiRHplBojSaV6PY=
      REGISTRY_MIDDLEWARE_REPOSITORY_OPENSHIFT_ENFORCEQUOTA:    false
      OPENSHIFT_DEFAULT_REGISTRY:               docker-registry.default.svc:5000
      REGISTRY_HTTP_TLS_KEY:                    /etc/secrets/registry.key
      REGISTRY_HTTP_TLS_CERTIFICATE:                /etc/secrets/registry.crt
    Mounts:
      /etc/secrets from registry-certificates (rw)
      /registry from registry-storage (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from registry-token-b3c8q (ro)
Conditions:
  Type      Status
  Initialized   True 
  Ready     False 
  PodScheduled  True 
Volumes:
  registry-storage:
    Type:   PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  registry-claim
    ReadOnly:   false
  registry-certificates:
    Type:   Secret (a volume populated by a Secret)
    SecretName: registry-certificates
    Optional:   false
  registry-token-b3c8q:
    Type:   Secret (a volume populated by a Secret)
    SecretName: registry-token-b3c8q
    Optional:   false
QoS Class:  Burstable
Node-Selectors: region=infra
Tolerations:    <none>
Events:
  FirstSeen LastSeen    Count   From                SubObjectPath   Type        Reason      Message
  --------- --------    -----   ----                -------------   --------    ------      -------
  4m        4m      1   default-scheduler               Normal      Scheduled   Successfully assigned docker-registry-1-4vqnk to openshift-gluster1
  <invalid> <invalid>   1   kubelet, openshift-gluster1         Warning     FailedMount MountVolume.SetUp failed for volume "kubernetes.io/glusterfs/dba2c845-8da5-11e7-bb1c-00505693371a-registry-volume" (spec.Name: "registry-volume") pod "dba2c845-8da5-11e7-bb1c-00505693371a" (UID: "dba2c845-8da5-11e7-bb1c-00505693371a") with: glusterfs: mount failed: mount failed: exit status 1
Mounting command: mount
Mounting arguments: 10.5.135.169:glusterfs-registry-volume /var/lib/origin/openshift.local.volumes/pods/dba2c845-8da5-11e7-bb1c-00505693371a/volumes/kubernetes.io~glusterfs/registry-volume glusterfs [log-file=/var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/glusterfs/registry-volume/docker-registry-1-4vqnk-glusterfs.log backup-volfile-servers=10.5.135.169:10.5.135.170:10.5.135.171 log-level=ERROR]
Output: Mount failed. Please check the log file for more details.

 the following error information was pulled from the glusterfs log to help diagnose this issue: 
[2017-09-05 15:00:36.369658] E [socket.c:2327:socket_connect_finish] 0-glusterfs-registry-volume-client-0: connection to 10.5.135.169:24007 failed (No route to host); disconnecting socket
[2017-09-05 15:00:36.373651] E [socket.c:2327:socket_connect_finish] 0-glusterfs-registry-volume-client-1: connection to 10.5.135.170:24007 failed (No route to host); disconnecting socket
jarrpa commented 7 years ago

@Asgoret PROGRESS!! :D What's the output of oc get -o yaml for the registry claim, the registry volume, and the endpoints used by the volume?

Asgoret commented 7 years ago

@jarrpa

[root@openshift-master ~]# oc get -o yaml endpoints
apiVersion: v1
items:
- apiVersion: v1
  kind: Endpoints
  metadata:
    creationTimestamp: 2017-08-30T17:07:00Z
    name: docker-registry
    namespace: default
    resourceVersion: "1980"
    selfLink: /api/v1/namespaces/default/endpoints/docker-registry
    uid: a2fa0b90-8da5-11e7-bb1c-00505693371a
  subsets: null
- apiVersion: v1
  kind: Endpoints
  metadata:
    creationTimestamp: 2017-08-30T17:05:28Z
    name: glusterfs-registry-endpoints
    namespace: default
    resourceVersion: "1709"
    selfLink: /api/v1/namespaces/default/endpoints/glusterfs-registry-endpoints
    uid: 6c57a180-8da5-11e7-bb1c-00505693371a
  subsets:
  - addresses:
    - ip: 10.5.135.169
    - ip: 10.5.135.170
    - ip: 10.5.135.171
    ports:
    - port: 1
      protocol: TCP
- apiVersion: v1
  kind: Endpoints
  metadata:
    creationTimestamp: 2017-08-30T16:53:16Z
    name: kubernetes
    namespace: default
    resourceVersion: "11"
    selfLink: /api/v1/namespaces/default/endpoints/kubernetes
    uid: b7e28941-8da3-11e7-bb1c-00505693371a
  subsets:
  - addresses:
    - ip: 10.5.135.185
    ports:
    - name: https
      port: 8443
      protocol: TCP
    - name: dns-tcp
      port: 8053
      protocol: TCP
    - name: dns
      port: 8053
      protocol: UDP
- apiVersion: v1
  kind: Endpoints
  metadata:
    creationTimestamp: 2017-08-30T17:06:50Z
    labels:
      router: router
    name: router
    namespace: default
    resourceVersion: "2266"
    selfLink: /api/v1/namespaces/default/endpoints/router
    uid: 9ceef529-8da5-11e7-bb1c-00505693371a
  subsets:
  - addresses:
    - ip: 10.5.135.169
      nodeName: openshift-gluster3
      targetRef:
        kind: Pod
        name: router-1-f0ksc
        namespace: default
        resourceVersion: "2132"
        uid: c6e74bfe-8da5-11e7-bb1c-00505693371a
    - ip: 10.5.135.170
      nodeName: openshift-gluster2
      targetRef:
        kind: Pod
        name: router-1-wm3nw
        namespace: default
        resourceVersion: "2213"
        uid: c6e74fe3-8da5-11e7-bb1c-00505693371a
    - ip: 10.5.135.171
      nodeName: openshift-gluster1
      targetRef:
        kind: Pod
        name: router-1-rp8k9
        namespace: default
        resourceVersion: "2263"
        uid: c6e74b03-8da5-11e7-bb1c-00505693371a
    ports:
    - name: 443-tcp
      port: 443
      protocol: TCP
    - name: 1936-tcp
      port: 1936
      protocol: TCP
    - name: 80-tcp
      port: 80
      protocol: TCP
kind: List
metadata: {}
resourceVersion: ""
selfLink: ""

[root@openshift-master ~]# oc get -o yaml pv
apiVersion: v1
items:
- apiVersion: v1
  kind: PersistentVolume
  metadata:
    annotations:
      pv.kubernetes.io/bound-by-controller: "yes"
    creationTimestamp: 2017-08-30T17:05:54Z
    name: registry-volume
    namespace: ""
    resourceVersion: "1750"
    selfLink: /api/v1/persistentvolumes/registry-volume
    uid: 7b8cdb20-8da5-11e7-bb1c-00505693371a
  spec:
    accessModes:
    - ReadWriteMany
    capacity:
      storage: 5Gi
    claimRef:
      apiVersion: v1
      kind: PersistentVolumeClaim
      name: registry-claim
      namespace: default
      resourceVersion: "1748"
      uid: 7d1ebf7a-8da5-11e7-bb1c-00505693371a
    glusterfs:
      endpoints: glusterfs-registry-endpoints
      path: glusterfs-registry-volume
    persistentVolumeReclaimPolicy: Retain
  status:
    phase: Bound
kind: List
metadata: {}
resourceVersion: ""
selfLink: ""

[root@openshift-master ~]# oc get -o yaml pvc
apiVersion: v1
items:
- apiVersion: v1
  kind: PersistentVolumeClaim
  metadata:
    annotations:
      pv.kubernetes.io/bind-completed: "yes"
      pv.kubernetes.io/bound-by-controller: "yes"
    creationTimestamp: 2017-08-30T17:05:56Z
    name: registry-claim
    namespace: default
    resourceVersion: "1752"
    selfLink: /api/v1/namespaces/default/persistentvolumeclaims/registry-claim
    uid: 7d1ebf7a-8da5-11e7-bb1c-00505693371a
  spec:
    accessModes:
    - ReadWriteMany
    resources:
      requests:
        storage: 5Gi
    volumeName: registry-volume
  status:
    accessModes:
    - ReadWriteMany
    capacity:
      storage: 5Gi
    phase: Bound
kind: List
metadata: {}
resourceVersion: ""
selfLink: ""
jarrpa commented 7 years ago

@Asgoret Okay. Can you verify that all your kube nodes can reach (e.g. ping) your gluster nodes? And have you opened the required ports on the glusterfs ports (24007, etc.)?

Asgoret commented 7 years ago

@jarrpa All kube can reach each other. About port 24007. On gluster node 1-3 firewalld masked. On gluster node 4-6 :

Chain IN_public_allow (1 references)
target     prot opt source               destination         
ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:ssh ctstate NEW
ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:24007 ctstate NEW
ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:24008 ctstate NEW
ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:24009 ctstate NEW
ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:38465 ctstate NEW
ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:38466 ctstate NEW
ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:38467 ctstate NEW
ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:38468 ctstate NEW
ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:38469 ctstate NEW
ACCEPT     tcp  --  anywhere             anywhere             tcp dpts:49152:49664 ctstate NEW

Other chains is empty and have 'ACCEPT' default policy

UPD: On master and slave nodes firewalld masked too. On ansible node is disable.

jarrpa commented 7 years ago

@Asgoret But can the openshift nodes reach the gluster nodes? e.g. can you ping or ssh into the gluster nodes from the openshift cluster?

Asgoret commented 7 years ago

@jarrpa Ansible can open ssh to all nodes. Master can open ssh to all gluster nodes. Slave nodes can not open ssh to any node. Gluster nodes can not open ssh to any node, but command gluster peer status show other peers in cluster like connected.

All nodes can ping and resolve each other node.

Asgoret commented 7 years ago

@jarrpa master must have less-ssh acces to al nodes, like ansible?

jarrpa commented 7 years ago

@Asgoret The actual SSH access doesn't matter, I just care that they have network access to each other, i.e. whether they can send packets to each other. ping would also suffice. What I want to know is if every OpenShift node has network access to every GlusterFS node. For any pod to use a GlusterFS volume, the hosting node has to be able to mount the volume.

Actually, that's a good enough test: can you mount the GlusterFS volume on all nodes using mount -t glusterfs 10.5.135.169:glusterfs-registry-volume <SOME_DIR>? If that fails, check the logs in /var/lib/glusterfs on 10.5.135.169 and see if there's anything showing an error about your mount attempts.

Asgoret commented 7 years ago

@jarrpa I have good news! Ansible end installation of openshift without any errors.

PLAY RECAP ****************************************************************************************************************************************************************************************************
localhost                  : ok=13   changed=0    unreachable=0    failed=0   
openshift-gluster1         : ok=158  changed=58   unreachable=0    failed=0   
openshift-gluster2         : ok=158  changed=58   unreachable=0    failed=0   
openshift-gluster3         : ok=158  changed=58   unreachable=0    failed=0   
openshift-gluster4         : ok=35   changed=6    unreachable=0    failed=0   
openshift-gluster5         : ok=35   changed=6    unreachable=0    failed=0   
openshift-gluster6         : ok=35   changed=6    unreachable=0    failed=0   
openshift-master           : ok=532  changed=196  unreachable=0    failed=0   
openshift-node1            : ok=160  changed=61   unreachable=0    failed=0   
openshift-node2            : ok=160  changed=61   unreachable=0    failed=0 

And

[root@openshift-master ~]# oc get all
NAME                  DOCKER REPO                                                 TAGS      UPDATED
is/registry-console   docker-registry.default.svc:5000/default/registry-console   latest    5 minutes ago

NAME                  REVISION   DESIRED   CURRENT   TRIGGERED BY
dc/docker-registry    1          3         3         config
dc/registry-console   1          1         1         config
dc/router             1          3         3         config

NAME                    DESIRED   CURRENT   READY     AGE
rc/docker-registry-1    3         3         3         13m
rc/registry-console-1   1         1         1         5m
rc/router-1             3         3         3         14m

NAME                      HOST/PORT                                                   PATH      SERVICES           PORT      TERMINATION   WILDCARD
routes/docker-registry    docker-registry-default.router.default.svc.cluster.local              docker-registry    <all>     passthrough   None
routes/registry-console   registry-console-default.router.default.svc.cluster.local             registry-console   <all>     passthrough   None

NAME                               CLUSTER-IP       EXTERNAL-IP   PORT(S)                   AGE
svc/docker-registry                172.30.149.192   <none>        5000/TCP                  13m
svc/glusterfs-registry-endpoints   172.30.154.128   <none>        1/TCP                     16m
svc/kubernetes                     172.30.0.1       <none>        443/TCP,53/UDP,53/TCP     34m
svc/registry-console               172.30.252.228   <none>        9000/TCP                  5m
svc/router                         172.30.244.158   <none>        80/TCP,443/TCP,1936/TCP   14m

NAME                          READY     STATUS    RESTARTS   AGE
po/docker-registry-1-jlb70    1/1       Running   0          9m
po/docker-registry-1-ndlnk    1/1       Running   0          9m
po/docker-registry-1-x9pkh    1/1       Running   0          9m
po/registry-console-1-0nn1f   1/1       Running   0          3m
po/router-1-33h7l             1/1       Running   3          10m
po/router-1-60xvv             1/1       Running   0          10m
po/router-1-bxz1m             1/1       Running   0          10m
Asgoret commented 7 years ago

@jarrpa And maybe i find bug in ansible work...

All nodes in cluster have manual disabled and stopped firewalld. When ansible works it install iptables.service to nodes which contents in [nodes] group like master,slave etc.

So, in some task ansible must open port 24007 on gluster nodes, but does not do that. So, when heketi try to mount docker-registry-volume it failed becouse can`t connect to close 24007 port.

UPD: I can revert VM and try to install openshift without manual stopping iptables.services if you need some log files.

UPD#2: I will revert VM because catch this error: https://github.com/openshift/origin/issues/16097 What files you need for diagnostic?

jarrpa commented 7 years ago

@Asgoret HUZZAH!!

I'm not sure I understand the problem, though. What did you do to allow the GlusterFS deployment to succeed, and why aren't the GlusterFS ports being opened in the firewall?

Asgoret commented 7 years ago

@jarrpa Well... I correct configure heketi.service on master node, make less ssh from master node to gluster nodes. It`s make pvc on gluster nodes succes.

Then i look at log files on gluster nodes. There was error, when gluster node 3 try to connect gluster node 1 by 24007 port. It was error "can`t route to host".

Then i go checking firewalld.service and found out that there was iptables.service installed (i do not do that). So i check iptable rules and see that port 24007 was openned on gluster node 3, but closed on gluster node 1-2.

All nodes was configured by one script (if need i can post it). So i revert all VM, start install again and when ansible wait for docker registry answer i stop iptables.service on all nodes manual. That`s all.

UPD: I can post log file of installation of openshift or post some tasks if you say what tasks i must look.

jarrpa commented 7 years ago

@Asgoret Hmm... this sounds like it may be slightly beyond my current understanding. The ansible script should take care of opening the firewall ports. Do you have any other problems with the firewall? Do you know why the ports on node 3 would have been open but not the other two?

Asgoret commented 7 years ago

@jarrpa UPD#2: I can write what configuration i have and write some sort of instruction. If we can add it to this part of repository: https://github.com/openshift/openshift-ansible/tree/master/inventory/byo

jarrpa commented 7 years ago

@Asgoret Sure, let's see what you've got.

Asgoret commented 7 years ago

@jarrpa Its my fault) Im sorry for that (((

No. All nodes was one base configuration script, all updates was in one time and was created from one iso-file (minimal), ansible use 'root' from ansible node, heketi user 'root' from master node. To install i use only epel or base repository.

Asgoret commented 7 years ago

@jarrpa

Ok. I copy ansible what i use now, and re-download last version.

Writing instruction take some time.

Asgoret commented 7 years ago

@jarrpa Ok. I found the issue of does not openned ports on gluster nodes. Base Centos works with firewalld.service, not with iptables.service. But ansible masked firewalld.service and install iptables.service. Rules for glusterfs connection in firewalld configuration files.

This happend only on docker-registry (infra node) nodes.

Asgoret commented 7 years ago

@jarrpa About instruction. First version and picture of architecture which i try to create. image Instruction.txt

jarrpa commented 7 years ago

Okay, this looks pretty straightforward. I'm not sure that belongs in this repo, however. The instructions should probably go somewhere in openshift-docs. We can take care of trying to add something in an appropriate place to cover this.

Thank you for your patience and dedication! If the situation has been sufficiently resolved, please go ahead and close this issue.

Asgoret commented 7 years ago

@jarrpa Thank you for you patience! Yes, this issue is closed) I`m so sorry, but I find another error\issue(

When I sharpen the installation script, then I'll print it here. Or I can mail it to you.

Asgoret commented 7 years ago

@jarrpa Hi, can you look at issue #5712 please?

jarrpa commented 7 years ago

@Asgoret Sure. Please don't double-tag me in the future. :)