jhunt / k8s-boshrelease

A BOSH Release for deploying Kubernetes clusters
MIT License
13 stars 9 forks source link

core-dns stay in not ready state #57

Closed obeyler closed 4 years ago

obeyler commented 4 years ago

inside my cluster with the coredns :

kube-system   coredns-5d56ff6d95-d9l9p                          0/1     Running   0          22s
kube-system   coredns-5d56ff6d95-k7khg                          0/1     Running   0          22s

log of one of core-dns pod

E0701 17:56:59.978720       1 reflector.go:178] pkg/mod/k8s.io/client-go@v0.18.3/tools/cache/reflector.go:125: Failed to list *v1.Namespace: Get "https://10.245.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.245.0.1:443: i/o timeout
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
I0701 17:57:57.774490       1 trace.go:116] Trace[2003272451]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.18.3/tools/cache/reflector.go:125 (started: 2020-07-01 17:57:27.774001105 +0000 UTC m=+308.579428679) (total time: 30.00046629s):
Trace[2003272451]: [30.00046629s] [30.00046629s] END
E0701 17:57:57.774511       1 reflector.go:178] pkg/mod/k8s.io/client-go@v0.18.3/tools/cache/reflector.go:125: Failed to list *v1.Endpoints: Get "https://10.245.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0": dial tcp 10.245.0.1:443: i/o timeout
I0701 17:57:58.209103       1 trace.go:116] Trace[1720252605]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.18.3/tools/cache/reflector.go:125 (started: 2020-07-01 17:57:28.208673746 +0000 UTC m=+309.014101344) (total time: 30.00039679s):
Trace[1720252605]: [30.00039679s] [30.00039679s] END
E0701 17:57:58.209124       1 reflector.go:178] pkg/mod/k8s.io/client-go@v0.18.3/tools/cache/reflector.go:125: Failed to list *v1.Service: Get "https://10.245.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.245.0.1:443: i/o timeout
[INFO] plugin/ready: Still waiting on: "kubernetes"

I haven't the same pb with the PR I proposed https://github.com/jhunt/k8s-boshrelease/pull/47

jhunt commented 4 years ago

Can you provide the manifest you are using to deploy the affected K8s cluster?

jhunt commented 4 years ago

PR #47 only bumped to coreDNS 1.6.9, and ignored a large chunk of changes introduced in the upstream Deployment manifests, which commit 22dc256 took into account, while also upgrading to latest stable, 1.7.0.

jhunt commented 4 years ago

On a 3-node labernetes running flannel, I see this:

$ kubectl -n kube-system get po
NAME                       READY   STATUS    RESTARTS   AGE
coredns-5d56ff6d95-dmn9w   1/1     Running   0          9m24s
coredns-5d56ff6d95-mbnll   1/1     Running   0          9m24s
kube-flannel-ds-d4t4w      1/1     Running   0          9m24s
kube-flannel-ds-n5rmv      1/1     Running   0          9m24s
kube-flannel-ds-ttmnv      1/1     Running   0          9m24s
kube-proxy-fj2b9           1/1     Running   0          9m25s
kube-proxy-mfkcz           1/1     Running   0          9m25s
kube-proxy-tmqfb           1/1     Running   0          9m25s
jhunt commented 4 years ago

from a describe pod on one of the coredns instances:

Containers:
  coredns:
    Container ID:  containerd://114470bac161cbcc76215372d32f23bdba7f1e8a4cde386aca4f8e2feef20d45
    Image:         docker.io/coredns/coredns:1.7.0
    Image ID:      docker.io/coredns/coredns@sha256:73ca82b4ce829766d4f1f10947c3a338888f876fbed0540dc849c89ff256e90c
obeyler commented 4 years ago
 kubectl get pods --all-namespaces
NAMESPACE              NAME                                              READY   STATUS             RESTARTS   AGE
cert-manager           cert-manager-7747db9d88-qpkwt                     1/1     Running            0          49m
cert-manager           cert-manager-cainjector-87c85c6ff-r42g8           0/1     CrashLoopBackOff   13         49m
cert-manager           cert-manager-webhook-64dc9fff44-nsbvb             0/1     Running            0          49m
fluxcd                 helm-operator-8489d9dcfd-4g45k                    0/1     CrashLoopBackOff   15         41m
kube-system            coredns-5d56ff6d95-qlbm6                          0/1     Running            0          49m
kube-system            coredns-5d56ff6d95-vmrzs                          0/1     Running            0          49m
kube-system            dashboard-kubernetes-dashboard-6cb5b448fd-7qlxb   0/1     CrashLoopBackOff   11         42m
kube-system            kube-proxy-68vwd                                  1/1     Running            0          50m
kube-system            kube-proxy-cmd47                                  1/1     Running            0          50m
kube-system            kube-proxy-cnfh9                                  1/1     Running            0          50m
kube-system            kube-proxy-dwrd9                                  1/1     Running            0          50m
kube-system            kube-proxy-fhlfg                                  1/1     Running            0          50m
kube-system            kube-proxy-k4dzw                                  1/1     Running            0          50m
kube-system            kube-proxy-l29qd                                  1/1     Running            0          50m
kube-system            kube-proxy-nzg4w                                  1/1     Running            0          50m
kube-system            kube-proxy-qvwlm                                  1/1     Running            0          50m
kube-system            kube-proxy-rpcbp                                  1/1     Running            0          50m
kube-system            kube-proxy-s52zz                                  1/1     Running            0          50m
kube-system            kube-proxy-s56x8                                  1/1     Running            0          50m
kube-system            kube-proxy-wgnxp                                  1/1     Running            0          50m
kube-system            kube-proxy-xkt6n                                  1/1     Running            0          50m
kube-system            weave-net-29m9m                                   2/2     Running            0          49m
kube-system            weave-net-2dbvr                                   2/2     Running            1          49m
kube-system            weave-net-6hzg6                                   2/2     Running            1          49m
kube-system            weave-net-7vzql                                   2/2     Running            0          49m
kube-system            weave-net-b2hbx                                   2/2     Running            1          49m
kube-system            weave-net-cfkgv                                   2/2     Running            1          49m
kube-system            weave-net-ckl5z                                   2/2     Running            0          49m
kube-system            weave-net-f2jgq                                   2/2     Running            1          49m
kube-system            weave-net-j7bqb                                   2/2     Running            0          49m
kube-system            weave-net-jvvr9                                   2/2     Running            0          49m
kube-system            weave-net-q7z5q                                   2/2     Running            0          49m
kube-system            weave-net-txbj6                                   2/2     Running            0          49m
kube-system            weave-net-w7n46                                   2/2     Running            1          49m
kube-system            weave-net-wd24r                                   2/2     Running            0          49m
kubernetes-dashboard   dashboard-metrics-scraper-6b4884c9d5-lfz2w        1/1     Running            0          49m
kubernetes-dashboard   kubernetes-dashboard-7f99b75bf4-269rl             1/1     Running            14         49m
obeyler commented 4 years ago
ontrol/264ecb67-06c2-4dc5-a26b-502f217508ea:~# kubectl describe pod -n kube-system coredns-5d56ff6d95-qlbm6
Name:                 coredns-5d56ff6d95-qlbm6
Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Node:                 4dc606f9-db54-403b-b1dc-827ea486d508.k8s/192.168.244.209
Start Time:           Wed, 01 Jul 2020 19:01:20 +0000
Labels:               k8s-app=kube-dns
                      pod-template-hash=5d56ff6d95
Annotations:          <none>
Status:               Running
IP:                   10.33.0.1
IPs:
  IP:           10.33.0.1
Controlled By:  ReplicaSet/coredns-5d56ff6d95
Containers:
  coredns:
    Container ID:  containerd://8e3404c08adbff7b4402eb1cc6e615778c82a14040b0872d479afeb744101f08
    Image:         docker.io/coredns/coredns:1.7.0
    Image ID:      docker.io/coredns/coredns@sha256:73ca82b4ce829766d4f1f10947c3a338888f876fbed0540dc849c89ff256e90c
    Ports:         53/UDP, 53/TCP, 9153/TCP
    Host Ports:    0/UDP, 0/TCP, 0/TCP
    Args:
      -conf
      /etc/coredns/Corefile
    State:          Running
      Started:      Wed, 01 Jul 2020 19:02:23 +0000
    Ready:          False
    Restart Count:  0
    Limits:
      memory:  170Mi
    Requests:
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
    Readiness:    http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /etc/coredns from config-volume (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from coredns-token-bb5zw (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      coredns
    Optional:  false
  coredns-token-bb5zw:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  coredns-token-bb5zw
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  kubernetes.io/os=linux
Tolerations:     CriticalAddonsOnly
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason                  Age                    From                                               Message
  ----     ------                  ----                   ----                                               -------
  Normal   Scheduled               52m                    default-scheduler                                  Successfully assigned kube-system/coredns-5d56ff6d95-qlbm6 to 4dc606f9-db54-403b-b1dc-827ea486d508.k8s
  Warning  FailedCreatePodSandBox  52m                    kubelet, 4dc606f9-db54-403b-b1dc-827ea486d508.k8s  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "53ba70eb790126367878e921656539602e1ce37a12ef7993d002c0d616d6415d": failed to find plugin "weave-net" in path [/var/vcap/packages/containerd/bin]
  Warning  FailedCreatePodSandBox  52m                    kubelet, 4dc606f9-db54-403b-b1dc-827ea486d508.k8s  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "47da407755eae72aa32af4ecae359f631261dfc02e1751433ea64335c13e16bc": failed to find plugin "weave-net" in path [/var/vcap/packages/containerd/bin]
  Warning  FailedCreatePodSandBox  51m                    kubelet, 4dc606f9-db54-403b-b1dc-827ea486d508.k8s  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "dfdf53a24a4a908d53a5292903ef1058687020e64d01e4ce7d69b12179e30aef": failed to find plugin "weave-net" in path [/var/vcap/packages/containerd/bin]
  Warning  FailedCreatePodSandBox  51m                    kubelet, 4dc606f9-db54-403b-b1dc-827ea486d508.k8s  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "3a4dc89955020c5d98c4cd267648a5e72319d2950b6764b4e5f9ea08b09bcbad": failed to find plugin "weave-net" in path [/var/vcap/packages/containerd/bin]
  Normal   Pulling                 51m                    kubelet, 4dc606f9-db54-403b-b1dc-827ea486d508.k8s  Pulling image "docker.io/coredns/coredns:1.7.0"
  Normal   Pulled                  51m                    kubelet, 4dc606f9-db54-403b-b1dc-827ea486d508.k8s  Successfully pulled image "docker.io/coredns/coredns:1.7.0"
  Normal   Created                 51m                    kubelet, 4dc606f9-db54-403b-b1dc-827ea486d508.k8s  Created container coredns
  Normal   Started                 51m                    kubelet, 4dc606f9-db54-403b-b1dc-827ea486d508.k8s  Started container coredns
  Warning  Unhealthy               2m18s (x295 over 51m)  kubelet, 4dc606f9-db54-403b-b1dc-827ea486d508.k8s  Readiness probe failed: HTTP probe failed with statuscode: 503
obeyler commented 4 years ago

@jhunt I think the pb is link to the usage of weavenet when the pod starts it puts the plugin on the /host/opt/cni Could you try with the weave CNI ?

check the close PR on Weanenet I used a volume mount specific to make weave install the plugin in the right place

        - name: cni-bin3
          mountPath: /host/opt/cni
     - name: cni-bin3
        hostPath:
          path: /var/vcap/packages/containerd/
jhunt commented 4 years ago

Yes, we map in /var/vcap/packages/containerd into the weave setup container, see https://github.com/jhunt/k8s-boshrelease/blob/master/jobs/net-weave/templates/k8s-init/weave.yml#L182-L184 for the volume definition and https://github.com/jhunt/k8s-boshrelease/blob/master/jobs/net-weave/templates/k8s-init/weave.yml#L139-L140 for the mount. I called it opt-cni to give it more context.

I will take weave for a spin; when I ran the weave-net version of labernetes, i had no issues deploying pods. I'll post back when i get that up and spinning, and I think I'll overhaul how manifests are managed to natively support ops files for each of the topologies.

obeyler commented 4 years ago

To my part I create several ops file. the main deployment is nearly empty. it just contains variable, update , stemcells, releases

and then I've got one operator by type of each instance_groups

this allow to create easily new type of nodes with specific taints

by example I've got 3 nodes dedicated to persistent and 2 other dedicated for exposition so one 2 more operators

jhunt commented 4 years ago

The weave CNI works for me. I did add the CHECKPOINT_DISABLE environment variable, since I had a transient failure related to network latency, and Weave doesn't need to know where I install Kubernetes clusters anyway. This might affect you, @obeyler since I believe you are running in air-gapped mode, but I don't see that env var being set in the PR you referenced.

For reference, here's the diff from the standard manifests/labernetes.yml test deployment:

--- manifests/labernetes.yml    2020-06-30 23:40:31.000000000 -0400
+++ manifests/weavernetes.yml   2020-07-02 09:05:37.000000000 -0400
@@ -54,8 +54,8 @@
       - name: runtime-runc
         release: k8s
         properties:
-          cni: flannel
-      - name: net-flannel
+          cni: weave
+      - name: net-weave
         release: k8s
       - name: kubelet
         release: k8s

If you can provide me with a deployment manifest that is exhibiting this behavior, I'd be happy to look into it further and try to reproduce on my end. I will go ahead and start adding ops-files for things like weave-net into https://github.com/jhunt/k8s-deployment, for documentation purposes if nothing else.

obeyler commented 4 years ago

Even with your last modification of CHECKPOINT_DISABLE env. the error persist and core-dns is running but not ready. I still looking why core-dns have timeout when it try to access to api-server. E0703 00:50:03.472801 1 reflector.go:178] pkg/mod/k8s.io/client-go@v0.18.3/tools/cache/reflector.go:125: Failed to list *v1.Endpoints: Get "https://10.245.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0": dial tcp 10.245.0.1:443: i/o timeout Inside the log of kube proxy I see some error, it can maybe explain why core-dns doesn't manage to communicate with api-server :

kubectl logs -n kube-system kube-proxy-89zv6
W0702 17:19:47.678898       1 server.go:439] using lenient decoding as strict decoding failed: strict decoder error for ---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
bindAddress: "0.0.0.0"
clientConnection:
  acceptContentTypes: ""
  burst: 10
  contentType: application/vnd.kubernetes.protobuf
  kubeconfig: /etc/kubeconfig.yml
  qps: 5
clusterCIDR: "10.244.0.0/16"
configSyncPeriod: 15m0s
conntrack:
  max: 0
  maxPerCore: 32768
  min: 131072
  tcpCloseWaitTimeout: 1h0m0s
  tcpEstablishedTimeout: 24h0m0s
enableProfiling: false
healthzBindAddress: 0.0.0.0:10256
iptables:
  masqueradeAll: false
  masqueradeBit: 14
  minSyncPeriod: 0s
  syncPeriod: 30s
kind: KubeProxyConfiguration
metricsBindAddress: 127.0.0.1:10249
mode: "iptables"
nodePortAddresses: []
oomScoreAdj: -999
portRange: ""
resourceContainer: /kube-proxy
udpIdleTimeout: 250ms: v1alpha1.KubeProxyConfiguration.Conntrack: v1alpha1.KubeProxyConntrackConfiguration.ReadObject: found unknown field: max, error found in #10 byte of ...|ck":{"max":0,"maxPer|..., bigger context ...|/16","configSyncPeriod":"15m0s","conntrack":{"max":0,"maxPerCore":32768,"min":131072,"tcpCloseWaitTi|...
I0702 17:19:47.938182       1 node.go:136] Successfully retrieved node IP: 192.168.244.209
I0702 17:19:47.938219       1 server_others.go:186] Using iptables Proxier.
I0702 17:19:47.938513       1 server.go:583] Version: v1.18.5
I0702 17:19:47.938949       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
I0702 17:19:47.939074       1 conntrack.go:52] Setting nf_conntrack_max to 131072
obeyler commented 4 years ago

I remove max: 0 from kube_proxy's config map and resourceContainer: /kube-proxy this two items is bad parsed by kubeproxy

obeyler commented 4 years ago

I move this part of kubeproxy config in another issue as it doesn't solve my pb for coredns. I'm still looking for why coredns doesn't access to kubeapi

jhunt commented 4 years ago

A couple of things to double-check:

  1. Is the apiserver process running on the control nodes?
  2. Does the kubernetes service in kube-system have anything behind it? (kubectl get ep kubernetes -n kube-system should show one or more ip:port entries)
  3. Is your CNI happy?

Can you provide a manifest that I can use to reproduce your setup?

obeyler commented 4 years ago

No endpoint for kubernetes inside kube-system ns kubectl get ep -n kube-system

NAME                      ENDPOINTS   AGE
kube-controller-manager   <none>      14h
kube-dns                              14h
kube-scheduler            <none>      14h

it is located to default ns

kubectl get ep 
NAME         ENDPOINTS                                                        AGE
kubernetes   192.168.244.206:6443,192.168.244.207:6443,192.168.244.208:6443   14h

this 3 address correspond to the 3 control ip

control/10ce9815-b14b-4806-b686-5dcfd5c34d00           running        z3  192.168.244.208  k8s  
control/1f54e813-1746-42c6-a5b1-0e42899be7e0           running        z1  192.168.244.206  k8s  
control/99b695e5-39e7-492f-b39e-a34ef8615869           running        z2  192.168.244.207  k8s 
obeyler commented 4 years ago

If I launch a container to test api server it respond http 403 (witch is correct as I don't give certificate) and this responce is different from the the core-dns log witch is timeout:

kubectl run tmp-shell --generator=run-pod/v1 --rm -i --tty --overrides='{"spec": {"hostNetwork": true}}' --image nicolaka/netshoot -- /bin/bash
 curl -k https://10.245.0.1:443/api/v1/services?limit=500&resourceVersion=0 
[1] 9
bash-5.0# {
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {

  },
  "status": "Failure",
  "message": "services is forbidden: User \"system:anonymous\" cannot list resource \"services\" in API group \"\" at the cluster scope",
  "reason": "Forbidden",
  "details": {
    "kind": "services"
  },
  "code": 403
}
jhunt commented 4 years ago

What happens if you run that tmp-shell in its own network namespace?

obeyler commented 4 years ago
kubectl run tmp-shell2 --generator=run-pod/v1 --rm -i --tty --overrides='{"spec": {"hostNetwork": false}}' --image nicolaka/netshoot -- /bin/bash
curl -k -v  https://10.245.0.1:443/api/v1/services?limit=500&resourceVersion=0 
[4] 10
bash-5.0# *   Trying 10.245.0.1:443...

I think this is the pb. It doesn't manage to reach it when it doesn't use hostNetwork

obeyler commented 4 years ago

It's like it doesn't manage to reach the service network

jhunt commented 4 years ago

That makes me suspect the CNI.

Can you provide a manifest that I can use to reproduce your setup?

obeyler commented 4 years ago

I sended it by slack To my part I suspect the Kube-proxy

jhunt commented 4 years ago

Okay, I'm going to try to build an equivalent version of your deployment in my lab today or tomorrow to see if I can repro this issue.

obeyler commented 4 years ago

Thanks By advance

obeyler commented 4 years ago

what is the difference beetween kube-proxy and bosh-kube-proxy ?

jhunt commented 4 years ago

We run our own image; bosh-kube-proxy is just the name we give it. You can see the Dockerfile here: https://github.com/jhunt/k8s-boshrelease/blob/master/images/kube-proxy/Dockerfile

I was able to reproduce your issue, and it's more than just accessibility of the kubernetes API via service. I am unable to get any network traffic out of a Pod that isn't hostNetwork: true. I'm going to try this with net-flannel and see if that's any better.

jhunt commented 4 years ago

For posterity, and because my memory these days is shot, here's the commands I'm running in a huntprod/jumpbox pod to approximate curling Google without involving CoreDNS (which is still borked):

root@jumpbox:/# curl -v -H Host:google.com https://172.217.15.110/ --connect-timeout 10
*   Trying 172.217.15.110...
* TCP_NODELAY set
* Connection timed out after 10000 milliseconds
* stopped the pause stream!
* Closing connection 0
curl: (28) Connection timed out after 10000 milliseconds

(having used dig to find the IP for google.com, on a system with a working resolver)

jhunt commented 4 years ago

I downsized the cluster to take some load off of my lab, and even with 1 LB / 1 control / 1 node, I still see the issue with the CNI and reachability of the default kubernetes service.

When I switch to flannel, on the other hand, and drop / recreate the coredns pods, they come up without any issue. I think this is a CNI problem, specifically related to weave.

jhunt commented 4 years ago

The above curl works under flannel:

root@jumpbox:/# curl -v -H Host:google.com https://172.217.15.110/ --connect-timeout 10
*   Trying 172.217.15.110...
* TCP_NODELAY set
* Connected to 172.217.15.110 (172.217.15.110) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, Unknown (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (OUT), TLS alert, Server hello (2):
* SSL certificate problem: self signed certificate
* stopped the pause stream!
* Closing connection 0
curl: (60) SSL certificate problem: self signed certificate
More details here: https://curl.haxx.se/docs/sslcerts.html

curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.

(the "failed to validate" is correct; we connected to google and while we sent a Host: header, we did NOT send an SNI so we get back an invalid self-signed cert telling us to "fix our client" -- go ahead; try it out in your browser, it's fun!)

obeyler commented 4 years ago

Good news, I checkout my old wip merge branch (https://github.com/orange-cloudfoundry/k8s-boshrelease/tree/wip-merge) where I put all pr together I've done and the coredns works on it

NAME                              READY   STATUS    RESTARTS   AGE
coredns-c89858866-lzhmb           1/1     Running   0          103s
coredns-c89858866-z5fgk           1/1     Running   0          103s
kube-proxy-46c8g                  1/1     Running   0          103s
kube-proxy-4v2c5                  1/1     Running   0          103s
kube-proxy-8vdzd                  1/1     Running   0          103s
kube-proxy-9s5qz                  1/1     Running   0          103s
kube-proxy-gbvrt                  1/1     Running   0          103s
kube-proxy-pzszf                  1/1     Running   0          103s
kube-proxy-v7kk5                  1/1     Running   0          103s
kube-proxy-w9k2d                  1/1     Running   0          103s
metrics-server-64b57fd654-g77pk   1/1     Running   1          102s
weave-net-4g4rd                   2/2     Running   0          102s
weave-net-9dvhm                   2/2     Running   0          101s
weave-net-bpfjv                   2/2     Running   0          102s
weave-net-dkzz6                   2/2     Running   1          101s
weave-net-m4knb                   2/2     Running   0          101s
weave-net-pgq7k                   2/2     Running   1          102s
weave-net-rhjcj                   2/2     Running   0          101s
weave-net-rjkgl                   2/2     Running   1          101s

I try to find the delta between what affect the current develop branch

Outch maybe not : If I try to reach api server from a pod deployed it also doesn't work kubectl run tmp-shell2 --generator=run-pod/v1 --rm -i --tty --overrides='{"spec": {"hostNetwork": false}}' --image nicolaka/netshoot -- /bin/bash

If you don't see a command prompt, try pressing enter. bash-5.0# bash-5.0# curl -k -v https://10.245.0.1:443/api/v1/services?limit=500&resourceVersion=0 [1] 6 bash-5.0# Trying 10.245.0.1:443... bash-5.0# connect to 10.245.0.1 port 443 failed: Operation timed out

obeyler commented 4 years ago

I found the pb it due to the fact that service and pod are not located to same network so it needs to add "sysctl net.ipv4.ip_forward=1"