peter20170715 commented 7 years ago

[root@k8s-1 test]# kubectl exec nginx -i -t -- /bin/bash root@nginx:/# cat /etc/resolv.conf nameserver 10.254.0.2 search default.svc.cluster.local. svc.cluster.local. cluster.local. options ndots:5

root@nginx:/# ping my-nginx ping: unknown host

之前有人问过此问题，说说docker参数没有传入，但我查docker启动参数没有发现问题： [root@k8s-1 test]# ps -ef|grep docker root 2176 1 0 22:21 ? 00:00:02 /root/local/bin/dockerd --bip=172.30.80.1/24 --ip-masq=true --mtu=1450 --log-level=error

请问如何解决？

opsnull commented 7 years ago

“07-部署Node节点.md” 中的 “验证集群功能” 一节通过了么？

看起来是 pod 网络不通引起的。检查下各 node 的 iptables 规则、 flannel 和 docker 的配置参数是否正常。

peter20170715 commented 7 years ago

master上 $iptables -nL [root@k8s-1 ~]#iptables -nL Chain INPUT (policy ACCEPT) target prot opt source destination
KUBE-FIREWALL all -- 0.0.0.0/0 0.0.0.0/0
ACCEPT tcp -- 10.0.0.0/8 0.0.0.0/0 tcp dpt:4194 ACCEPT tcp -- 172.16.0.0/12 0.0.0.0/0 tcp dpt:4194 ACCEPT tcp -- 172.30.0.0/16 0.0.0.0/0 tcp dpt:4194 ACCEPT tcp -- 192.168.0.0/16 0.0.0.0/0 tcp dpt:4194 DROP tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:4194

Chain FORWARD (policy DROP) target prot opt source destination
DOCKER-ISOLATION all -- 0.0.0.0/0 0.0.0.0/0
ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED DOCKER all -- 0.0.0.0/0 0.0.0.0/0
ACCEPT all -- 0.0.0.0/0 0.0.0.0/0
ACCEPT all -- 0.0.0.0/0 0.0.0.0/0

Chain OUTPUT (policy ACCEPT) target prot opt source destination
KUBE-FIREWALL all -- 0.0.0.0/0 0.0.0.0/0
KUBE-SERVICES all -- 0.0.0.0/0 0.0.0.0/0 / kubernetes service portals /

Chain DOCKER (1 references) target prot opt source destination

Chain DOCKER-ISOLATION (1 references) target prot opt source destination
RETURN all -- 0.0.0.0/0 0.0.0.0/0

Chain KUBE-FIREWALL (2 references) target prot opt source destination
DROP all -- 0.0.0.0/0 0.0.0.0/0 / kubernetes firewall for dropping marked packets / mark match 0x8000/0x8000

Chain KUBE-SERVICES (1 references) target prot opt source destination
REJECT udp -- 0.0.0.0/0 10.254.0.2 / kube-system/kube-dns:dns has no endpoints / udp dpt:53 reject-with icmp-port-unre achableREJECT tcp -- 0.0.0.0/0 10.254.0.2 / kube-system/kube-dns:dns-tcp has no endpoints / tcp dpt:53 reject-with icmp-port- unreachable

node上执行 $iptables -nL [root@k8s-2 ~]# iptables -nL Chain INPUT (policy ACCEPT) target prot opt source destination
KUBE-FIREWALL all -- 0.0.0.0/0 0.0.0.0/0
ACCEPT tcp -- 10.0.0.0/8 0.0.0.0/0 tcp dpt:4194 ACCEPT tcp -- 172.16.0.0/12 0.0.0.0/0 tcp dpt:4194 ACCEPT tcp -- 172.16.0.0/12 0.0.0.0/0 tcp dpt:4194 ACCEPT tcp -- 192.168.0.0/16 0.0.0.0/0 tcp dpt:4194 DROP tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:4194

Chain FORWARD (policy DROP) target prot opt source destination
DOCKER-ISOLATION all -- 0.0.0.0/0 0.0.0.0/0
ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED DOCKER all -- 0.0.0.0/0 0.0.0.0/0
ACCEPT all -- 0.0.0.0/0 0.0.0.0/0
ACCEPT all -- 0.0.0.0/0 0.0.0.0/0

Chain OUTPUT (policy ACCEPT) target prot opt source destination
KUBE-FIREWALL all -- 0.0.0.0/0 0.0.0.0/0
KUBE-SERVICES all -- 0.0.0.0/0 0.0.0.0/0 / kubernetes service portals /

Chain DOCKER (1 references) target prot opt source destination

Chain DOCKER-ISOLATION (1 references) target prot opt source destination
RETURN all -- 0.0.0.0/0 0.0.0.0/0

Chain KUBE-FIREWALL (2 references) target prot opt source destination
DROP all -- 0.0.0.0/0 0.0.0.0/0 / kubernetes firewall for dropping marked packets / mark match 0x8000/0x8000

Chain KUBE-SERVICES (1 references) target prot opt source destination
REJECT udp -- 0.0.0.0/0 10.254.0.2 / kube-system/kube-dns:dns has no endpoints / udp dpt:53 reject-with icmp-port-unreachable REJECT tcp -- 0.0.0.0/0 10.254.0.2 / kube-system/kube-dns:dns-tcp has no endpoints / tcp dpt:53 reject-with icmp-port-unreachable

network

docker的配置没有发现问题。

您看看是不是iptables规则的问题？

之前集群验证是通过的，重启机器后就不能通过了，我曾经在其他机器上也装过，发现验证时，第一次执行命令，可以返回nginx的页面信息，之后再执行就超时，没有返回正常信息。

xlyoung commented 7 years ago

试试在每个节点都执行： iptables -P FORWARD ACCEPT

然后再测试看看

JaeGerW2016 commented 7 years ago

如果是redhat系列的话关注下SElinux状态是否有关闭，之前也碰到过网络i/o timeout

hoperuin commented 6 years ago

kubectl logs -f corednsXXXX -n kube-system 报错如下： Failed to list *v1.Namespace: Get https://10.254.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: tls: failed to parse certificate from server: x509: cannot parse dnsName "kubernetes.default.svc.cluster.local."

ioiioo commented 6 years ago

@hoperuin 昨天部署集群插件,DNS 也遇到与你一样的问题,现在还没有解决;请问你那边是否有解决方案?

hoperuin commented 6 years ago

@ioiioo kube-dns也是需要增加证书配置的。

opsnull commented 6 years ago

@hoperuin 不需要配置额外的证书，kube-dns、coredns 默认使用 ServiceAccount 中的 Token 和 CA 来访问 apiserver。你的那个问题，估计是 ServiceAccount 配置的有问题。

hoperuin commented 6 years ago

都是按你的教程一步一步走的，如果不配证书就报x509,配上证书就好了。

opsnull commented 6 years ago

@ioiioo @hoperuin

x509: cannot parse dnsName "kubernetes.default.svc.cluster.local." 的问题，已解决，参考：

https://github.com/opsnull/follow-me-install-kubernetes-cluster/issues/233

ioiioo commented 6 years ago

@opsnull 己按照你的解决方案,解决了该问题,不过你回复的解决方案过于简单;一开始我并没有懂,并不知道在那里去把 kubernetes.default.svc.cluster.local. 中的 . 给去掉.

后来我看到你在回复解决办法下面有引用相关代码的提交. 通过code diff 发现具体的解决办如下:

如果在6月29日之前安装dns插件不成功的同学,可以按以下方法在重试下,应能解决问题,现最新版本的教程作者己更新了.

1.请按现在最新版本的 06-1.部署 kube-apiserver 组件 章节重新生成 kubernetes 证书和私钥

主要的变动就是将

source /opt/k8s/bin/environment.sh
cat > kubernetes-csr.json <<EOF
{
  "CN": "kubernetes",
  "hosts": [
  ...
    "kubernetes.default.svc.cluster",
    "kubernetes.default.svc.cluster.local",
    "kubernetes.default.svc.${CLUSTER_DNS_DOMAIN}"
  ],
  ....

修改为:

source /opt/k8s/bin/environment.sh
cat > kubernetes-csr.json <<EOF
{
  "CN": "kubernetes",
  "hosts": [
  ...
    "kubernetes.default.svc.cluster",
    "kubernetes.default.svc.cluster.local"
  ],
  ....

主要的动作就是在生成 kubernetes-csr.json 这个文件的时候, 去掉 "kubernetes.default.svc.${CLUSTER_DNS_DOMAIN}" 这一行

之后在重新按06-1章节的教程一步步重新来一便就可以解决 09-1.dns插件.md 这一章节中安排coreDNS插件不成功的问题.

2.重启master节点上之前己经启动过的kube-apiserver

source /opt/k8s/bin/environment.sh
ssh root@${MASTER_IP} "systemctl restart kube-apiserver"

opsnull / follow-me-install-kubernetes-cluster

kube-dns安装后，测试不通过 #151

root@nginx:/# ping my-nginx ping: unknown host

然后再测试看看