Closed 4admin2root closed 6 years ago
I change the kube_network_plugin to flannel and it works
me too fatal: [node1]: FAILED! => {"changed": false, "elapsed": 180, "failed": true, "msg": "Timeout when waiting for 10.233.0.2:53"}
same here with k8s 1.6.0 and Ubuntu xenial 16.04 as host.
Ok, my problem could be related to being installing it in Azure although had the same problem yesterday in DigitalOcean. In the coming weeks I will install it in some baremetal servers, I will try then another time with calico, I will stick for now with Azure and flannel for testing purposes.
for some reason this failed for me on flannel / baseos coreos-beta.
I have the same issue with both 1.5.3 & 1.6.0, on Ubuntu 16.04, running inside OpenStack, using ansible 2.2.1 (in a venv because 2.2.2 is borked).
Only options added from default are:
ipip: true
calico_mtu: 1340
It seems the roles/dnsmasq/meta/main.yml:
---
dependencies:
- role: download
file: "{{ downloads.dnsmasq }}"
when: dns_mode == 'dnsmasq_kubedns' and download_localhost|default(false)
tags: [download, dnsmasq]
...never gets to run as I don't see any andyshinn/dnsmask:2.72
image anywhere on the nodes.
I'm not sure it's related to the download_run_once
and download_localhost
, in roles/download/defaults/main.yml
, whose raison d'être i'm not sure about, but I don't need local downloads.
I'm new to kubernetes, but my guess is that deployments should be able to pull their own image when they need to, through docker...?
More debug / info:
# kubectl get deployment --all-namespaces
NAMESPACE NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
kube-system deploy/dnsmasq 1 0 0 0 1h
kube-system deploy/dnsmasq-autoscaler 1 0 0 0 1h
# kubectl describe -f /etc/kubernetes/dnsmasq-svc.yml
Name: dnsmasq
Namespace: kube-system
Labels: k8s-app=dnsmasq
kubernetes.io/cluster-service=true
Selector: k8s-app=dnsmasq
Type: ClusterIP
IP: 10.233.0.2
Port: dns-tcp 53/TCP
Endpoints: <none>
Port: dns 53/UDP
Endpoints: <none>
Session Affinity: None
No events.
# curl http://localhost:8080/api/v1/proxy/namespaces/kube-system/services/dnsmasq
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {},
"status": "Failure",
"message": "no endpoints available for service \"dnsmasq\"",
"reason": "ServiceUnavailable",
"code": 503
}
Any luck if you add the calico network to the allowed address ranges in openstack ( mentioned at the bottom of the doc below )? https://github.com/kubernetes-incubator/kargo/blob/master/docs/openstack.md
I'm already using ipip: true
and calico_mtu: 1340
.
if I also set cloud_provider: "openstack"
(with or without neutron port-update for a 10.233.0.0/16 range) it gets stuck at :
RUNNING HANDLER [kubernetes/master : Master | wait for the apiserver to be running] ***
[...]
FAILED - RETRYING: HANDLER: kubernetes/master : Master | wait for the apiserver to be running (1 retries left).
FAILED - RETRYING: HANDLER: kubernetes/master : Master | wait for the apiserver to be running (1 retries left).
FAILED - RETRYING: HANDLER: kubernetes/master : Master | wait for the apiserver to be running (1 retries left).
fatal: [staging_test_002]: FAILED! => {"attempts": 20, "changed": false, "content": "", "failed": true, "msg": "Status code was not [200]: Request failed: <urlopen error [Errno 111] Connection refused>", "redirected": false, "status": -1, "url": "http://localhost:8080/healthz"}
fatal: [staging_test_001]: FAILED! => {"attempts": 20, "changed": false, "content": "", "failed": true, "msg": "Status code was not [200]: Request failed: <urlopen error [Errno 111] Connection refused>", "redirected": false, "status": -1, "url": "http://localhost:8080/healthz"}
fatal: [staging_test_003]: FAILED! => {"attempts": 20, "changed": false, "content": "", "failed": true, "msg": "Status code was not [200]: Request failed: <urlopen error [Errno 111] Connection refused>", "redirected": false, "status": -1, "url": "http://localhost:8080/healthz"}
EDIT³:
With flannel or calico, I now get stuck at this on the first run, and at the problem above on the second:
RUNNING HANDLER [kubernetes/master : Master | wait for kube-scheduler] *********
Not sure if the latest git pull is responsible for this ...
Looks like with the latest version (on 2017-04-06), it still timesout, but after a 15 minutes the service is running, though the curl test still fails and I can't resolve inside a container.
So it seems if you set the cloud_provider kubernetes is trying to reach out to openstack before the DNS services are setup and this breaks the install flow. Hope this helps to figure out how to troubleshoot and resolve!
@justicel: strangely, I have the same result with the 1.6.1 (released today) and no cloud_provider set...
I had no problem installing 1.6.1 in baremetal using calico, everything is working fine, I upgraded some packages in the config though. Probably the package upgrade is not needed but I wanted to test with the latest releases.
etcd_version: v3.1.4 calico_version: "v1.1.0" calico_cni_version: "v1.6.1" calico_policy_version: "v0.5.4" flannel_version: v0.7.0
Calico had one problem with kubernetes 1.6 fixed at calico cni 1.6.1, you can find the details here: http://docs.projectcalico.org/v2.1/releases/
Anyway my problem with calico was cleary related to trying to use it in Azure so maybe the update is not needed.
The problem I have seen is not specifically with the network driver, as long as you follow the instructions for your platform. The issue is that the kubernetes API/management pods need to be spun up after DNS services are established or temporarily use some other name-servers than the kubedns/dnsmasq name-servers. What happens is that openstack API or others can't be reached until these pods are up and it's a chicken/egg thing.
Similar failure with weave
network plugin. Will try again with flannel
to see if that helps.
Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG REPORT
Environment:
printf "$(uname -srm)\n$(cat /etc/os-release)\n"
): centos 7.2ansible --version
): 2.2.1.0Kargo version (commit) (
git rev-parse --short HEAD
): f6cd42eNetwork plugin used: calico
Copy of your inventory file: [kube-master] kg1 kg2
[etcd] kg1 kg2 kg3
[kube-node] kg2 kg3 kg4
[k8s-cluster:children] kube-node kube-master
Command used to invoke ansible:
Output of ansible run:
TASK [dnsmasq : Start Resources] *** task path: /usr/local/lvzj/github/kargo/roles/dnsmasq/tasks/main.yml:65 Tuesday 21 March 2017 13:14:24 +0800 (0:00:01.126) 0:05:52.038 * Using module file /usr/local/lvzj/github/kargo/library/kube.py