Closed feng-du closed 4 years ago
i have the same Problem with Openshift 3.7
I find all the related content, but not give me answers.
If anyone knows the answer, please let me know.
@flipkill1985 I finally installed successed it this morning.
[root@localhost ~]# oc get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default docker-registry-1-zmgt4 1/1 Running 0 10m
default registry-console-1-6dnjv 1/1 Running 0 10m
default router-1-n479h 1/1 Running 0 12m
kube-service-catalog apiserver-8sd62 1/1 Running 0 9m
kube-service-catalog controller-manager-5bbvb 1/1 Running 0 9m
openshift-ansible-service-broker asb-1-deploy 1/1 Running 0 8m
openshift-ansible-service-broker asb-1-scb6l 0/1 ImagePullBackOff 0 8m
openshift-ansible-service-broker asb-etcd-1-jq6s5 1/1 Running 0 8m
openshift-template-service-broker apiserver-dd6mh 1/1 Running 0 7m
I've always followed this video: https://blog.openshift.com/installing-openshift-3-7-1-30-minutes/ but all failed.
After I follow this step one by one, successed: https://docs.openshift.org/latest/install_config/install/host_preparation.html
maybe missing some prerequisites package before.
https://docs.openshift.org/latest/install_config/install/host_preparation.html
This is for openshift 3.9 not 3.7.x ???
@flipkill1985 I installed v3.7 successed.
Can you post your steps and the playbook you use? Please :)
This time, I use basic config for test, so I don't set hostname, dns, docker storage .... ,but I think this is easy until you successed installed.
1
yum install wget git net-tools bind-utils iptables-services bridge-utils bash-completion kexec-tools sos psacct
yum update
2
yum -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
sed -i -e "s/^enabled=1/enabled=0/" /etc/yum.repos.d/epel.repo
yum -y --enablerepo=epel install ansible pyOpenSSL
3
git clone https://github.com/openshift/openshift-ansible
cd openshift-ansible
git checkout release-3.7
cd ~/
4
yum install docker-1.13.1
systemctl start docker
systemctl enable docker
5
ssh-keygen -t rsa
-- change to your host ip
ssh-copy-id -i ~/.ssh/id_rsa.pub 10.1.7.39
6
vi /etc/ansible/hosts
-- my hosts, change with your host ip,
-- dev.cefcfco.com this is my domain, change with your's
[OSEv3:children]
masters
nodes
etcd
nfs
[OSEv3:vars]
ansible_ssh_user=root
os_sdn_network_plugin_name='redhat/openshift-ovs-multitenant'
openshift_disable_check=disk_availability,docker_storage,memory_availability,docker_image_availability,package_version
openshift_docker_options='--selinux-enabled --insecure-registry 172.30.0.0/16'
deployment_type=origin
openshift_deployment_type=origin
openshift_release=v3.7
openshift_hosted_etcd_storage_kind=nfs
openshift_hosted_etcd_storage_nfs_options="*(rw,root_squash,sync,no_wdelay)"
openshift_hosted_etcd_storage_nfs_directory=/opt/osev3-etcd
openshift_hosted_etcd_storage_volume_name=etcd-vol2
openshift_hosted_etcd_storage_access_modes=["ReadWriteOnce"]
openshift_hosted_etcd_storage_volume_size=1G
openshift_hosted_etcd_storage_labels={'storage': 'etcd'}
ansible_service_broker_image_prefix=openshift/
ansible_service_broker_registry_url="registry.access.redhat.com"
openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider', 'filename': '/etc/origin/master/htpasswd'}]
openshift_public_hostname=dev.cefcfco.com
openshift_master_default_subdomain=apps.dev.cefcfco.com
[masters]
10.1.7.39 openshift_schedulable=true
[etcd]
10.1.7.39
[nfs]
10.1.7.39
[nodes]
10.1.7.39 openshift_schedulable=true openshift_node_labels="{'region': 'infra', 'zone': 'default'}"
7
ansible-playbook -i /etc/ansible/hosts openshift-ansible/playbooks/byo/config.yml -vvv
dont work :( wich Distribution do you use, i use centos 7.4
Thats the Error:
fatal: [sp-peter02.os.peter.es]: FAILED! => {
"attempts": 120,
"changed": false,
"cmd": [
"curl",
"-k",
"https://apiserver.kube-service-catalog.svc/healthz"
],
"delta": "0:00:00.144529",
"end": "2018-03-23 09:34:21.024849",
"invocation": {
"module_args": {
"_raw_params": "curl -k https://apiserver.kube-service-catalog.svc/healthz",
"_uses_shell": false,
"chdir": null,
"creates": null,
"executable": null,
"removes": null,
"stdin": null,
"warn": false
}
},
"rc": 0,
"start": "2018-03-23 09:34:20.880320",
"stderr": " % Total % Received % Xferd Average Speed Time Time Time Current\n Dload Upload Total Spent Left Speed\n\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\r100 180 100 180 0 0 1311 0 --:--:-- --:--:-- --:--:-- 1313",
"stderr_lines": [
" % Total % Received % Xferd Average Speed Time Time Time Current",
" Dload Upload Total Spent Left Speed",
"",
" 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0",
"100 180 100 180 0 0 1311 0 --:--:-- --:--:-- --:--:-- 1313"
],
"stdout": "[+]ping ok\n[+]poststarthook/generic-apiserver-start-informers ok\n[+]poststarthook/start-service-catalog-apiserver-informers ok\n[-]etcd failed: reason withheld\nhealthz check failed",
"stdout_lines": [
"[+]ping ok",
"[+]poststarthook/generic-apiserver-start-informers ok",
"[+]poststarthook/start-service-catalog-apiserver-informers ok",
"[-]etcd failed: reason withheld",
"healthz check failed"
]
}
# curl -k https://apiserver.kube-service-catalog.svc/healthz
[+]ping ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/start-service-catalog-apiserver-informers ok
[-]etcd failed: reason withheld
healthz check failed
Please can someone help me???
Thats the Problem
# curl -k https://apiserver.kube-service-catalog.svc/healthz
[+]ping ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/start-service-catalog-apiserver-informers ok
[-]etcd failed: reason withheld
healthz check failed
@flipkill1985 ansible hosts , you use ip or hostname ? I found, if I use hostname , then failed, but use ip successed. I guess this is a DNS problem. so I will try to install a dns server after.
i use hostname
@flipkill1985 I think this may be related to https://github.com/openshift/origin/issues/17316
Do you have a wildcard entry for *.dev.cefcfco.com
configured in your DNS?
I've recently experienced a similar issue where the apiserver pod failed to resolve the etcd hosts correctly because the DNS lookup was matching a wildcard DNS, entry due to the search
and ndots
configuration in /etc/resolv.conf
inside the apiserver pod
see my comment here as i found similar behavior: https://github.com/openshift/openshift-ansible/issues/8076
I'm running into the same issue.
[OSEv3:children]
masters
nodes
etcd
lb
[OSEv3:vars]
ansible_python_interpreter=/usr/bin/python3
ansible_ssh_user=fedora
ansible_become=true
openshift_deployment_type=origin
openshift_release=v3.9
openshift_master_cluster_method=native
openshift_master_cluster_hostname=k8s.unigs.de
openshift_master_cluster_public_hostname=cloud.unigs.de
[masters]
node1.k8s.unigs.de
node3.k8s.unigs.de
node5.k8s.unigs.de
[etcd]
node1.k8s.unigs.de
node3.k8s.unigs.de
node5.k8s.unigs.de
[lb]
lb.k8s.unigs.de ansible_python_interpreter=/usr/bin/python ansible_ssh_user=root
[nodes]
node1.k8s.unigs.de openshift_node_labels="{'region': 'infra','zone': 'default'}"
node3.k8s.unigs.de openshift_node_labels="{'region': 'infra','zone': 'default'}"
node5.k8s.unigs.de openshift_node_labels="{'region': 'infra','zone': 'default'}"
node2.k8s.unigs.de openshift_node_labels="{'region': 'infra','primary': 'default'}"
node4.k8s.unigs.de openshift_node_labels="{'region': 'infra','primary': 'default'}"
node6.k8s.unigs.de openshift_node_labels="{'region': 'infra','primary': 'default'}"
node 1 to 6 are fedora atomic, lb is centos 7. All on the latest version.
I have done all the prepare commands and setup a fully working dns (inluding wildcard, they point to the lb).
I noticed that 1 of 3 curl -k https://apiserver.kube-service-catalog.svc/healthz
will return ok.
Is there anything i can provide to give you a clue what could be wrong?
On a retest from scratch with an external load balancer i got stuck in exactly the same error.
The healthz url seems to only work on a single node. It fails in ~66% of the curls.
for i in {1..1000}; do curl -s -k https://apiserver.kube-service-catalog.svc/healthz \
| grep -oE '^ok \
|etcd.*'; done \
| sort \
| uniq -c
662 etcd failed: reason withheld
338 ok
I think i found the reason why its not working:
some of the api servers do not work:
apiserver-8n5g5 @node1 curl -k https://10.128.0.4:6443 healthz check failed
apiserver-cdbfh @node3 curl -k https://10.129.0.4:6443 healthz check failed
apiserver-n4qm7 @node2 curl -k https://10.130.0.6:6443 ok
a quick look with describe showed me that hey try to reslove the etcd servers:
Command:
/usr/bin/service-catalog
Args:
apiserver
--storage-type
etcd
--secure-port
6443
--etcd-servers
https://node1.k8s.unigs.de:2379,https://node2.k8s.unigs.de:2379,https://node3.k8s.unigs.de:2379
--etcd-cafile
/etc/origin/master/master.etcd-ca.crt
--etcd-certfile
/etc/origin/master/master.etcd-client.crt
--etcd-keyfile
/etc/origin/master/master.etcd-client.key
-v
3
--cors-allowed-origins
localhost
--admission-control
KubernetesNamespaceLifecycle,DefaultServicePlan,ServiceBindingsLifecycle,ServicePlanChangeValidator,BrokerAuthSarCheck
--feature-gates
OriginatingIdentity=true
i exec into the contianer and run the following commands:
sh-4.2# ping node1.k8s.unigs.de
PING node1.k8s.unigs.de.k8s.unigs.de (10.18.255.99) 56(84) bytes of data.
64 bytes from lb.k8s.unigs.de (10.18.255.99): icmp_seq=1 ttl=63 time=0.213 ms
that is clearly wrong. Notice the point on the end on the next command.
sh-4.2# ping node1.k8s.unigs.de.
PING node1.k8s.unigs.de (10.18.255.1) 56(84) bytes of data.
64 bytes from node1.k8s.unigs.de (10.18.255.1): icmp_seq=1 ttl=63 time=0.730 ms
oh interesting!
sh-4.2# cat /etc/resolv.conf
nameserver 10.18.255.2
search kube-service-catalog.svc.cluster.local svc.cluster.local cluster.local k8s.unigs.de
options ndots:5
as far as i understand it, the ndots:5
option forces to lookup hostnames with fewer than 5 dots. i have 4. so node1.k8s.unigs.de
gets resolved to node1.k8s.unigs.de.k8s.unigs.de
.
does this ndots option make sense? and how can i force it to use the domain name i provided?
i tried adding openshift_ip=
to all of my hosts, but that did not change the result.
I finally got it to work. The cause of the issue was that i had a wildcard A record on the domain i used.
If there is no wildcard entry node1.k8s.unigs.de.k8s.unigs.de
gets not resolved and it will try to resolve the correct name.
I redeployed the same stuff on another domain, without a wildcard record and it worked!
this may also works for these issues:
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle rotten /remove-lifecycle stale
Rotten issues close after 30d of inactivity.
Reopen the issue by commenting /reopen
.
Mark the issue as fresh by commenting /remove-lifecycle rotten
.
Exclude this issue from closing again by commenting /lifecycle frozen
.
/close
@openshift-bot: Closing this issue.
Description
I have installed the openshift-origin v3.7, v3.8, v3.9, v3.10, but all got following issues: There may be some prerequisites for service catalog?
My inventory hosts: