Closed ahmadou closed 6 years ago
/assign mjudeikis
checking. If you still have env running it would be good to see "oc describe" of the heketi pod
Here goes :
oc describe pod heketi-storage
Name: heketi-storage-2-deploy
Namespace: glusterfs
Node: picnode02.mycompany.internal/10.39.57.103
Start Time: Wed, 04 Apr 2018 17:35:47 +0200
Labels: openshift.io/deployer-pod-for.name=heketi-storage-2
Annotations: openshift.io/deployment.name=heketi-storage-2
openshift.io/scc=restricted
Status: Failed
IP: 10.130.0.2
Containers:
deployment:
Container ID: docker://1915d25e1cbbca0e63034a55c1f9100fb1d16ed527bb563e41294a17619aa77d
Image: openshift/origin-deployer:v3.7.1
Image ID: docker-pullable://docker.io/openshift/origin-deployer@sha256:2e39b45e1a68fd25647f0fd64b19d81b9dee04ee84ec49fefc2a28580dc9ab61
Port:
OPENSHIFT_DEPLOYMENT_NAME: heketi-storage-2
OPENSHIFT_DEPLOYMENT_NAMESPACE: glusterfs
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from deployer-token-l94cr (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
deployer-token-l94cr:
Type: Secret (a volume populated by a Secret)
SecretName: deployer-token-l94cr
Optional: false
QoS Class: BestEffort
Node-Selectors:
50m 50m 1 default-scheduler Normal Scheduled Successfully assigned heketi-storage-2-deploy to picnode02.mycompany.internal 50m 50m 1 kubelet, picnode02.mycompany.internal Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "deployer-token-l94cr" 50m 50m 1 kubelet, picnode02.mycompany.internal spec.containers{deployment} Normal Pulling pulling image "openshift/origin-deployer:v3.7.1" 50m 50m 1 kubelet, picnode02.mycompany.internal spec.containers{deployment} Normal Pulled Successfully pulled image "openshift/origin-deployer:v3.7.1" 50m 50m 1 kubelet, picnode02.mycompany.internal spec.containers{deployment} Normal Created Created container 50m 50m 1 kubelet, picnode02.mycompany.internal spec.containers{deployment} Normal Started Started container
can you please go to one of the running gluster pods and check:
gluster volume list
?
Is it the same install you faced the first issue with oc binary? or its fresh environment? Looks like heketi database is not created, and it might be due to the last failure.
if volume is not there, just delete glusterfs project and rerun?
Im spinning up my env now to test try to replicate this. But if you are in the position to join screen sharing session in https:// bluejeans.com/ 2794238616/
here.
It is the same install but i've uninstalled it and reinstalled so many times i've lost count. each time i got he same issue. I can start from scratch again if you want
I noticed i have no router pods also and i don't know if its' due to the install failling at that stage or if i should have had them already spinning.
I don't know how to execute the command you gave me. I can't access the pods terminal (it give me a warning about privilegs even though i have cluster admin role) and i don't seem to be able to docker exec -it bash in the pod container.
edit :
i'v run gluster volume list in the container and got
heketidbstorage
Do you want me to re do a clean install ?
Router and other pods will come later. In the initial install, it is not needed.
Did you did uninstall using gluster-uninstall playbook?
And it should let you in to the pods, if you are cluster-admin. You might be admin of the project?
I've not run gluster uninstall but i did delete the project and manually clean the lvm volume. I also use this custom playbo---
hosts: all remote_user: ansibleuser become: yes become_method: sudo tasks:
name: 1. Stop docker service service: name: docker state: stopped
name: 2. Remove gluster vg shell: vgs | grep vg_ | awk '{print $1}' | xargs -r vgremove -f -y
name: 3. Cleanup Gluster install file: path: '{{ item }}' state: absent with_items:
name: 4. Start docker service service: name: docker state: started ok to cleanup :+1:
I'll try to run the uninstall playbook and restart but can you give me the correct playbook to use ?
try this one and rerun:
https://github.com/openshift/openshift-ansible/blob/release-3.7/playbooks/openshift-glusterfs/uninstall.yml
im checking the playbooks as we chat
It fails because playbooks/init/main.yml is not to be found
Looks like the issue is with firewalls. Did test in my lab and stuff works. Should not block the release.
working with @ahmadou offline in real-time.
@ahmadou for future please try to format your output (config/ errors etc ) thx
Also i saw you are using ansible 2.5.0
and while this is a different question not related to this issue i'm curious to know from @sdodson if we already moved to this version or not ?
@DanyC97 Ok will do. I made an upgrade to the ansible version because it didn't want to start on a 2.4 if i remember correctly
@ahmadou for iptables, try this on one of the nodes you were not able to do manual mount of the glusterfs:
iptables -I INPUT -m state --state NEW -m tcp -p tcp --dport 24007:24011 -j ACCEPT
iptables -I INPUT -m state --state NEW -m tcp -p tcp --dport 111 -j ACCEPT
iptables -I INPUT -m state --state NEW -m udp -p udp --dport 111 -j ACCEPT
iptables -I INPUT -m state --state NEW -m tcp -p tcp --dport 38465:38467 -j ACCEPT
iptables -I INPUT -m state --state NEW -m multiport -p multiport --dport 49152:49664 -j ACCEPT
iptables -I INPUT -m state --state NEW -m tcp -p tcp --dport 2222 -j ACCEPT
service iptables save
#iptables: Saving firewall rules to /etc/sysconfig/iptables:[ OK ]
systemctl restart iptables.service
@mjudeikis
Well i didn't apply your suggestion because the firewall rules in my system are managed out of the machines themselves.
I've allowed all traffic between nodes and it worked !! So the issue of the heketidbstorage was a firewall issue.
In summarry : since you set up a lot of firewall rules, it's best to not setup to many rules when installing the cluster or do the documentation needs an update ?
I'm still getting in error. Now the heketi pod starts but the install fails because of syntax i think :
TASK [openshift_storage_glusterfs : Delete pre-existing glusterblock provisioner resources] ************************************************************************************************************************************************************************************ Thursday 05 April 2018 10:48:53 +0200 (0:00:00.761) 0:08:44.135 ******** fatal: [master01.mycompany]: FAILED! => {"msg": "The conditional check 'not openshift_is_atomic | bool' failed. The error was: error while evaluating conditional (not openshift_is_atomic | bool): 'openshift_is_atomic' is undefined\n\nThe error appears to have been in '/home/ansibleuser/openshift-ansible/roles/openshift_storage_glusterfs/tasks/glusterblock_deploy.yml': line 2, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n---\n- name: Delete pre-existing glusterblock provisioner resources\n ^ here\n"}
Just do git pull. This was fixed by https://github.com/openshift/openshift-ansible/pull/7772
Idea is that openshift-ansible
will configure all firewall rules for you. if you have an external firewall, in your case big brother V
, you need to replicate those rules there if you are under strict firewall management. If no, keep them loose, and iptables on the boxes will do the job.
Ok i got it.
The installation process completed the gluster fs phase but now crashes at the metrics portion :
TASK [openshift_metrics : generate hawkular-cassandra replication controllers] ************************************************************************************************************************************************************************************************* Thursday 05 April 2018 13:17:24 +0200 (0:00:00.388) 0:38:28.969 ******** failed: [master01.mycompany.com] (item=1) => {"changed": false, "item": "1", "msg": "AnsibleUndefinedVariable: 'unicode object' has no attribute 'items'"}
Do you want me to open a new ticket ?
This is different stuff. I would suspect you missing some variable in your inventory. this is outside this ticket so we can close this one :)
Ok thank you all for your assistance.
It was a pleasure. I will open a new ticket concerning that issue if i don't manage to find an explanation by myself
Description
On a new install on a multi master setup with a glusterfs storage setup, the install fail at the "Wait for heketi Pod" task.
Sometimes it will get stuck on the image pull phase and sometimes it will be because the heketi pod is stuck on a crash loop.
Version
Please put the following version information in the code block indicated below.
ansible --version
ansible 2.5.0 config file = /home/ansibleuser/openshift-ansible/ansible.cfg configured module search path = [u'/home/ansibleuser/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules'] ansible python module location = /usr/lib/python2.7/site-packages/ansible executable location = /usr/bin/ansible python version = 2.7.5 (default, Aug 4 2017, 00:39:18) [GCC 4.8.5 20150623 (Red Hat 4.8.5-16)]
If you're operating from a git clone:
git describe
openshift-ansible-3.7.42-1-26-g388d11bSteps To Reproduce
Expected Results
Cluster up and running and glusterfs configured
Observed Results
Describe what is actually happening.
Here are the logs of the heketi-storage container :
Additional Information
Provide any additional information which may help us diagnose the issue. CentOS Linux release 7.4.1708
My config file
Configuration globale cluster
[OSEv3:children] masters etcd nodes glusterfs glusterfs_registry
VARIABLES GLOBALES CLUSTER
[OSEv3:vars]
etcd
openshift_use_etcd_system_container=True
ansible
ansible_ssh_user=ansibleuser ansible_become=true ansible_service_broker_image_prefix=openshift/ ansible_service_broker_registry_url="registry.access.redhat.com"
checks disk
openshift_check_min_host_disk_gb=13
firewall
os_firewall_use_firewalld=True
deployment configuration
openshift_deployment_type=origin
openshift_version=3.9.0
openshift_pkg_version=3.7.1
containerized=true
configuration glusterfs
openshift_storage_glusterfs_namespace=glusterfs openshift_storage_glusterfs_name=storage
configuration registry interne
openshift_hosted_registry_storage_kind=glusterfs openshift_registry_selector="region=infranodes" openshift_hosted_registry_replicas=3 openshift_hosted_registry_storage_volume_size=190Gi
configuration routers
openshift_router_selector="region=routingnodes"
configuration noeuds standard
osm_default_node_selector="region=standardnodes"
configuration points d'acces master et api
openshift_master_cluster_hostname=master-lb.mycompany.internal openshift_master_cluster_public_hostname=console.mycompany.com openshift_master_default_subdomain=mycompany.com openshift_master_api_port=8443 openshift_master_console_port=8443 openshift_master_session_name=ssn openshift_public_ip="xx.xx.xx.xx"
configuration du certificats des routeurs
openshift_hosted_router_certificate={"certfile": "/home/ansibleuser/openshift-ansible/customCertificates/STAR_mycompany.crt", "keyfile": "/home/ansibleuser/openshift-ansible/customCertificates/mycompany.key", "cafile": "/home/ansibleuser/openshift-ansible/customCertificates/COMODORSADomainValidationSecureServerCA.crt"}
configuration du ldap
openshift_master_identity_providers=[{'name': 'picv4_ldap', 'challenge': 'true', 'login': 'true', 'kind': 'LDAPPasswordIdentityProvider', 'attributes': {'id': ['dn'], 'email': ['mail'], 'name': ['cn'], 'preferredUsername': ['uid']}, 'bindDN': 'uid=ldapbind,cn=users,cn=accounts,dc=ggd,dc=mycompany', 'bindPassword': 'tetetetetetge', 'ca': '', 'insecure': 'true', 'url': 'ldap://ldap.picv4.mycompany:389/cn=users,cn=accounts,dc=picv4,dc=mycompany?uid'}]
configuration de la politique d'audit
openshift_master_audit_config={"enabled": true, "auditFilePath": "/var/log/openpaas-oscp-audit/openpaas-oscp-audit.log", "maximumFileRetentionDays": 14, "maximumFileSizeMegabytes": 500, "maximumRetainedFiles": 5}
configuration logs cluster
openshift_logging_install_logging="true" openshift_logging_es_pvc_dynamic="true" openshift_logging_es_pvc_size="100G" openshift_logging_curator_default_days="2" openshift_logging_curator_run_hour="24" openshift_master_logging_public_url="https://logs.mycompany.com"
openshift_logging_es_nodeselector="region=infranodes" openshift_logging_kibana_ops_nodeselector="region=infranodes" openshift_logging_curator_ops_nodeselector="region=infranodes"
configuration metrics
openshift_metrics_install_metrics="true" openshift_metrics_cassandra_storage_type="dynamic" openshift_metrics_duration=7 openshift_metrics_cassandra_pvc_size="20G" openshift_metrics_cassandra_replicas=1 openshift_metrics_cassandra_limits_memory="2Gi" openshift_metrics_cassandra_limits_cpu="2000m" openshift_metrics_cassandra_nodeselector="region=infranodes" openshift_master_metrics_public_url="https://metrics.mycompany.com"
NOEUDS GLUSTER FS
[glusterfs] storage01.mycompany.internal glusterfs_devices='[ "/dev/sdc"]' glusterfs_ip=10.39.57.31 storage02.mycompany.internal glusterfs_devices='[ "/dev/sdc"]' glusterfs_ip=10.39.57.32 storage03.mycompany.internal glusterfs_devices='[ "/dev/sdc"]' glusterfs_ip=10.39.57.33 storage04.mycompany.internal glusterfs_devices='[ "/dev/sdc"]' glusterfs_ip=10.39.57.34
config glusterfs
[glusterfs:vars] openshift_storage_glusterfs_nodeselector="glusterfs=standardstorage" openshift_storage_glusterfs_wipe="true"
NOEUDS GLUSTER FS DEDIES AU REGISTRY INTERNE
[glusterfs_registry] storage-registry01.mycompany.internal glusterfs_devices='[ "/dev/sdc"]' glusterfs_ip=10.39.57.41 storage-registry02.mycompany.internal glusterfs_devices='[ "/dev/sdc"]' glusterfs_ip=10.39.57.42 storage-registry03.mycompany.internal glusterfs_devices='[ "/dev/sdc"]' glusterfs_ip=10.39.57.43
NOEUDS DU CLUSTER
Groupe des VMS Master
[masters] master0[1:2].mycompany.internal
noeuds etcd
[etcd] etcd01.mycompany.internal etcd02.mycompany.internal etcd03.mycompany.internal
Noeuds Openshift
[nodes]
Infra Nodes
infranode0[1:2].mycompany.internal openshift_node_labels="{'region' : 'infranodes'}" openshift_schedulable=true
Pic nodes
picnode0[1:2].mycompany.internal openshift_node_labels="{'region' : 'picnodes'}" openshift_schedulable=true
Compilation nodes
compilnode0[1:2].mycompany.internal openshift_node_labels="{'region' : 'compilnodes'}" openshift_schedulable=true
routing nodes
routeur0[1:2].mycompany.internal openshift_node_labels="{'region' : 'routingnodes'}"
standard nodes
node0[1:2].mycompany.internal openshift_node_labels="{'region' : 'standardnodes'}" openshift_schedulable=true
masters
master0[1:2].mycompany.internal openshift_node_labels="{'region' : 'masters'}" openshift_schedulable=true
glusterfs nodes
storage0[1:4].mycompany.internal openshift_node_labels="{'region' : 'standardstorage'}"
glusterfs registry nodes
storage-registry0[1:3].mycompany.internal openshift_node_labels="{'region' : 'registrystorage'}"
variables specifiques noeuds openshift
[nodes:vars] openshift_docker_options=--log-driver json-file --log-opt max-size=1M --log-opt max-file=3 --selinux-enabled