tkestack / galaxy

Providing high-performance network for Kubernetes
Other
109 stars 39 forks source link

Pod liveness and readiness gates failed with ipvlan l2 mode #122

Closed chenchun closed 3 years ago

currycan commented 3 years ago

@chenchun I met this problem when using the floating ip, the pod health check would be failed. The deployment is:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-floatingip
spec:
  strategy:
    type: Recreate
  replicas: 3
  selector:
    matchLabels:
      app: nginx-floatingip
  template:
    metadata:
      name: nginx-floatingip
      labels:
        app: nginx-floatingip
      annotations:
        k8s.v1.cni.cncf.io/networks: "galaxy-k8s-vlan"
        k8s.v1.cni.galaxy.io/release-policy: "immutable"
    spec:
      tolerations:
        - operator: "Exists"
      containers:
      - name: nginx
        image: nginx:alpine
        ports:
          - name: http-80
            containerPort: 80
        resources:
          requests:
            cpu: "0.1"
            memory: "32Mi"
            tke.cloud.tencent.com/eni-ip: "1"
          limits:
            cpu: "0.1"
            memory: "32Mi"
            tke.cloud.tencent.com/eni-ip: "1"
        livenessProbe:
          # httpGet:
          #   path: /
          #   port: 80
          #   scheme: HTTP
          tcpSocket:
            port: 80
          initialDelaySeconds: 30
          periodSeconds: 10
          successThreshold: 1
          failureThreshold: 3
          timeoutSeconds: 1
        readinessProbe:
          # httpGet:
          #   path: /
          #   port: 80
          #   scheme: HTTP
          tcpSocket:
            port: 80
          initialDelaySeconds: 5
          periodSeconds: 5
          successThreshold: 2
          failureThreshold: 3
          timeoutSeconds: 1

the pod will always restart due to the health check probe failed. This is the pod describition info:

  Warning  FailedScheduling  82s                default-scheduler  deployment nginx-floatingip has allocated 3 ips with replicas of 3, wait for releasing
  Warning  FailedScheduling  82s                default-scheduler  deployment nginx-floatingip has allocated 3 ips with replicas of 3, wait for releasing
  Normal   Scheduled         78s                default-scheduler  Successfully assigned default/nginx-floatingip-5cdcd7bcbd-6ql2x to 10.177.140.18
  Warning  Unhealthy         16s (x3 over 36s)  kubelet            Liveness probe failed: dial tcp 10.177.140.44:80: i/o timeout
chenchun commented 3 years ago

This issue is about galaxy-ipam liveness and readiness gates. Can you provide more information? Can you ping the pod ip from host network? Can you curl the pod port from inside the pod?

currycan commented 3 years ago

@chenchun ping and curl both succeed in the pod

[root@k8s-master-01 ~]# kubectl get po -o wide
NAME                                READY   STATUS    RESTARTS   AGE    IP              NODE            NOMINATED NODE   READINESS GATES
nginx-floatingip-c895bbb7f-hs9bk    1/1     Running   1          2d     10.177.140.46   10.177.140.16   <none>           <none>
nginx-floatingip-c895bbb7f-tkl8j    1/1     Running   0          2d     10.177.140.53   10.177.140.18   <none>           <none>
nginx-floatingip-c895bbb7f-tplc9    1/1     Running   1          2d     10.177.140.44   10.177.140.16   <none>           <none>
[root@k8s-master-01 ~]# kubectl exec -it nginx-floatingip-c895bbb7f-hs9bk -- sh
/ # ping 10.177.140.46
PING 10.177.140.46 (10.177.140.46): 56 data bytes
64 bytes from 10.177.140.46: seq=0 ttl=64 time=0.046 ms
64 bytes from 10.177.140.46: seq=1 ttl=64 time=0.070 ms
^C
--- 10.177.140.46 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.046/0.058/0.070 ms
/ # curl 10.177.140.46
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>
/ #

But ping or curl pod both failed if pod scheduled in the node. For example, the pod nginx-floatingip-c895bbb7f-hs9bk scheduled in the 10.177.140.16 node, ping or curl both failed

chenchun commented 3 years ago

May I ask what is the underlay network? Is it a VPC or a IDC network? Lots of vpc network drops packets from unknown mac address. If you use galaxy-k8s-vlan cni, it connects pods with host via a veth pair, thus pods has their own mac addresses.

currycan commented 3 years ago

Yes, I use the galaxy-k8s-vlan in the IDC underlay network. galaxy.json is

    {
      "NetworkConf":[
        {"name":"tke-route-eni","type":"tke-route-eni","eni":"eth1","routeTable":1},
        {"name":"galaxy-flannel","type":"galaxy-flannel", "delegate":{"type":"galaxy-veth"},"subnetFile":"/run/flannel/subnet.env"},
        {"name":"galaxy-k8s-vlan","type":"galaxy-k8s-vlan", "device":"ens192", "switch":"ipvlan", "ipvlan_mode":"l2"},
        {"name":"galaxy-k8s-sriov","type": "galaxy-k8s-sriov", "device": "ens192", "vf_num": 10}
      ],
      "DefaultNetworks": ["galaxy-flannel"],
      "ENIIPNetwork": "galaxy-k8s-vlan"
    }

How to create a veth pair in the pod when using the ipvlan mode which disturbed me very much. I have already turn on promiscuous mode in the host

chenchun commented 3 years ago

You mean ping pod on another node or ping the other pod on the same node is unreachable ?

currycan commented 3 years ago

if the pod in the node, this node ping pod is unreachable

chenchun commented 3 years ago

https://github.com/moby/moby/issues/21735#issuecomment-205904902 @currycan

Note: In both Macvlan and Ipvlan you are not able to ping or communicate with the default namespace IP address. For example, if you create a container and try to ping the Docker host's eth0 it will not work. That traffic is explicitly filtered by the kernel modules themselves to offer additional provider isolation and security.

The default namespace is not reachable per ipvlan design in order to isolate container namespaces from the underlying host.

currycan commented 3 years ago

@chenchun If using the floating IP, the pod's livenessProbe and readinessProbe will be unavailable, which will be very terrible. I get some other information from: https://hansedong.github.io/2019/03/19/14/ But how to create another veth pair in the pod like this:

{
    "name": "cni0",
    "cniVersion": "0.3.1",
    "plugins": [
        {
            "nodename": "k8s-node-2",
            "name": "myipvlan",
            "type": "ipvlan",
            "debug": true,
            "master": "eth0",
            "mode": "l2",
            "ipam": {
                "type": "host-local",
                "subnet": "172.18.12.0/24",
                "rangeStart": "172.18.12.211",
                "rangeEnd": "172.18.12.230",
                "gateway": "172.18.12.1",
                "routes": [
                    {
                        "dst": "0.0.0.0/0"
                    }
                ]
            }
        },
        {
            "name": "ptp",
            "type": "unnumbered-ptp",
            "hostInterface": "eth0",
            "containerInterface": "veth0",
            "ipMasq": true
        }
    ]
}
chenchun commented 3 years ago

@currycan I would rather suggest you to use galaxy-underlay-veth instead of ipvlan which is based on proxy_arp. It's the ideal solution, livenessProbe, readinessProbe and kubernetes service all works.

currycan commented 3 years ago

@chenchun I changed the mode to galaxy-underlay-veth, and probes work well. But the network seems something wrong, the domain name can't be resolved in the pod:

/ # nslookup cloud.tencent.com
;; connection timed out; no servers could be reached

/ # cat /etc/resolv.conf
nameserver 172.31.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
/ # route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.177.143.254  0.0.0.0         UG    0      0        0 eth0
10.177.140.0    0.0.0.0         255.255.252.0   U     0      0        0 eth0
chenchun commented 3 years ago

Can you try ping 172.31.0.10? and also try to ping coredns pod ip directly? Is your coredns pod using flannel network? Does the flannel network still work between these two hosts ?

chenchun commented 3 years ago

I also suggest you to try running coredns with host network which is more simple and reliable.

currycan commented 3 years ago

Coredns and flannel are running, and coredns is running using the flannel cni. ping coredns cluster ip and pod ip are reachable. And if running coredns with host network, Do I still need to create a service for coredns?

currycan commented 3 years ago

@chenchun I tested it for a long time and finally found that it was a problem with the dnsPolicy configuration of coreDNS deployment.The value of dnsPolicy must be "default"

chenchun commented 3 years ago

So, everything is working now?

currycan commented 3 years ago

@chenchun Yes, thank you very much!