nre-learning / antidote-selfmedicate

Configs and scripts for spinning up a local instance of Antidote on your laptop for testing and lesson development
Apache License 2.0
12 stars 19 forks source link

aweb pod fails to start ? #71

Open olberger opened 4 years ago

olberger commented 4 years ago

I'm testing the latest 0.6.0 version of selfmedicate (with libvirt), and cannot seem to be able to connect to the Web app.

The NGinx ingress responds with 503, and the aweb pod fails to deploy/start :


$ kubectl describe pod/aweb-7977f6bf4-wp5vz
Name:           aweb-7977f6bf4-wp5vz
Namespace:      default
Priority:       0
Node:           antidote-060/192.168.121.55
Start Time:     Mon, 13 Apr 2020 20:51:21 +0000
Labels:         antidote_role=infra
                app=aweb
                pod-template-hash=7977f6bf4
Annotations:    k8s.v1.cni.cncf.io/networks-status:
                  [{
                      "name": "",
                      "ips": [
                          "10.32.0.15"
                      ],
                      "default": true,
                      "dns": {}
                  }]
Status:         Running
IP:             10.32.0.15
IPs:            <none>
Controlled By:  ReplicaSet/aweb-7977f6bf4
Containers:
  aweb:
    Container ID:   docker://6da093a0a8d3509891a36297e12ec6ecbcfaa3756c10044c84e17cf2b1e9599d
    Image:          antidotelabs/antidote-web:latest
    Image ID:       docker-pullable://antidotelabs/antidote-web@sha256:3711bfa6e8d5af58402aa11bd732bacb76f6aac65d193c625d211a5ad04b3f54
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Mon, 13 Apr 2020 21:13:20 +0000
    Last State:     Terminated
      Reason:       Error
      Exit Code:    137
      Started:      Mon, 13 Apr 2020 20:58:48 +0000
      Finished:     Mon, 13 Apr 2020 21:11:32 +0000
    Ready:          False
    Restart Count:  1
    Readiness:      http-get http://:80/ delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      WEBSSH2_LOCATION:  http://antidote-local:30010
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-bb2jw (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  default-token-bb2jw:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-bb2jw
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason                  Age                   From                   Message
  ----     ------                  ----                  ----                   -------
  Normal   Scheduled               23m                   default-scheduler      Successfully assigned default/aweb-7977f6bf4-wp5vz to antidote-060
  Warning  FailedCreatePodSandBox  23m                   kubelet, antidote-060  Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "2a471cfe95cb813f028ddcfc1595100f9fad013d9963396ea2466225161b48f8" network for pod "aweb-7977f6bf4-wp5vz": NetworkPlugin cni failed to set up pod "aweb-7977f6bf4-wp5vz_default" network: failed to find plugin "multus" in path [/opt/cni/bin], failed to clean up sandbox container "2a471cfe95cb813f028ddcfc1595100f9fad013d9963396ea2466225161b48f8" network for pod "aweb-7977f6bf4-wp5vz": NetworkPlugin cni failed to teardown pod "aweb-7977f6bf4-wp5vz_default" network: failed to find plugin "multus" in path [/opt/cni/bin]]
  Normal   SandboxChanged          22m (x8 over 23m)     kubelet, antidote-060  Pod sandbox changed, it will be killed and re-created.
  Normal   Pulling                 22m                   kubelet, antidote-060  Pulling image "antidotelabs/antidote-web:latest"
  Normal   Pulled                  16m                   kubelet, antidote-060  Successfully pulled image "antidotelabs/antidote-web:latest"
  Normal   Started                 16m                   kubelet, antidote-060  Started container aweb
  Normal   Created                 16m                   kubelet, antidote-060  Created container aweb
  Warning  Unhealthy               8m17s (x47 over 15m)  kubelet, antidote-060  Readiness probe failed: HTTP probe failed with statuscode: 403
  Warning  FailedCreatePodSandBox  2m5s                  kubelet, antidote-060  Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "43978315ea6a80da61008bcb9aa279cd4f78398b1686211111430c3b63f7ae6d" network for pod "aweb-7977f6bf4-wp5vz": NetworkPlugin cni failed to set up pod "aweb-7977f6bf4-wp5vz_default" network: Multus: Err in loading K8s Delegates k8s args: Multus: Err in getting k8s network from pod: getPodNetworkAnnotation: failed to query the pod aweb-7977f6bf4-wp5vz in out of cluster comm: Get https://10.96.0.1:443/api/v1/namespaces/default/pods/aweb-7977f6bf4-wp5vz: dial tcp 10.96.0.1:443: i/o timeout
  Warning  FailedCreatePodSandBox  2m3s                  kubelet, antidote-060  Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "a75a59a157a11cb66441ac5c137f1ef642246763162e875c0b5d87400ba45bf0" network for pod "aweb-7977f6bf4-wp5vz": NetworkPlugin cni failed to set up pod "aweb-7977f6bf4-wp5vz_default" network: Multus: Err in tearing down failed plugins: Multus: error in invoke Delegate add - "weave-net": unable to allocate IP address: Post http://127.0.0.1:6784/ip/a75a59a157a11cb66441ac5c137f1ef642246763162e875c0b5d87400ba45bf0: dial tcp 127.0.0.1:6784: connect: connection refused, failed to clean up sandbox container "a75a59a157a11cb66441ac5c137f1ef642246763162e875c0b5d87400ba45bf0" network for pod "aweb-7977f6bf4-wp5vz": NetworkPlugin cni failed to teardown pod "aweb-7977f6bf4-wp5vz_default" network: Multus: error in invoke Delegate del - "weave-net": Delete http://127.0.0.1:6784/ip/a75a59a157a11cb66441ac5c137f1ef642246763162e875c0b5d87400ba45bf0: dial tcp 127.0.0.1:6784: connect: connection refused]
  Warning  FailedCreatePodSandBox  2m                    kubelet, antidote-060  Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "70145f85a2ee037987f91a1125678d4467e5bdcc17e4fa0c175ca25419a3ea2b" network for pod "aweb-7977f6bf4-wp5vz": NetworkPlugin cni failed to set up pod "aweb-7977f6bf4-wp5vz_default" network: Multus: Err in tearing down failed plugins: Multus: error in invoke Delegate add - "weave-net": unable to allocate IP address: Post http://127.0.0.1:6784/ip/70145f85a2ee037987f91a1125678d4467e5bdcc17e4fa0c175ca25419a3ea2b: dial tcp 127.0.0.1:6784: connect: connection refused, failed to clean up sandbox container "70145f85a2ee037987f91a1125678d4467e5bdcc17e4fa0c175ca25419a3ea2b" network for pod "aweb-7977f6bf4-wp5vz": NetworkPlugin cni failed to teardown pod "aweb-7977f6bf4-wp5vz_default" network: Multus: error in invoke Delegate del - "weave-net": Delete http://127.0.0.1:6784/ip/70145f85a2ee037987f91a1125678d4467e5bdcc17e4fa0c175ca25419a3ea2b: dial tcp 127.0.0.1:6784: connect: connection refused]
  Warning  FailedCreatePodSandBox  117s                  kubelet, antidote-060  Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "56a877677bf95c067f25e81c860b4e6fb247c8b067bb7a9bc9a500700442b4e3" network for pod "aweb-7977f6bf4-wp5vz": NetworkPlugin cni failed to set up pod "aweb-7977f6bf4-wp5vz_default" network: Multus: Err in tearing down failed plugins: Multus: error in invoke Delegate add - "weave-net": unable to allocate IP address: Post http://127.0.0.1:6784/ip/56a877677bf95c067f25e81c860b4e6fb247c8b067bb7a9bc9a500700442b4e3: dial tcp 127.0.0.1:6784: connect: connection refused, failed to clean up sandbox container "56a877677bf95c067f25e81c860b4e6fb247c8b067bb7a9bc9a500700442b4e3" network for pod "aweb-7977f6bf4-wp5vz": NetworkPlugin cni failed to teardown pod "aweb-7977f6bf4-wp5vz_default" network: Multus: error in invoke Delegate del - "weave-net": Delete http://127.0.0.1:6784/ip/56a877677bf95c067f25e81c860b4e6fb247c8b067bb7a9bc9a500700442b4e3: dial tcp 127.0.0.1:6784: connect: connection refused]
  Warning  FailedCreatePodSandBox  115s                  kubelet, antidote-060  Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "d5f753fbb720f611fc9c94bc451c0192ff1fb990872dc5e2418b0bcaf3025c5c" network for pod "aweb-7977f6bf4-wp5vz": NetworkPlugin cni failed to set up pod "aweb-7977f6bf4-wp5vz_default" network: Multus: Err in tearing down failed plugins: Multus: error in invoke Delegate add - "weave-net": unable to allocate IP address: Post http://127.0.0.1:6784/ip/d5f753fbb720f611fc9c94bc451c0192ff1fb990872dc5e2418b0bcaf3025c5c: dial tcp 127.0.0.1:6784: connect: connection refused, failed to clean up sandbox container "d5f753fbb720f611fc9c94bc451c0192ff1fb990872dc5e2418b0bcaf3025c5c" network for pod "aweb-7977f6bf4-wp5vz": NetworkPlugin cni failed to teardown pod "aweb-7977f6bf4-wp5vz_default" network: Multus: error in invoke Delegate del - "weave-net": Delete http://127.0.0.1:6784/ip/d5f753fbb720f611fc9c94bc451c0192ff1fb990872dc5e2418b0bcaf3025c5c: dial tcp 127.0.0.1:6784: connect: connection refused]
  Normal   SandboxChanged          110s (x7 over 2m38s)  kubelet, antidote-060  Pod sandbox changed, it will be killed and re-created.
  Warning  FailedCreatePodSandBox  110s                  kubelet, antidote-060  Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "438593ef97dca23d5a1d4882cec5d8992e218d8c5a4b10c6c071caa041e3db3f" network for pod "aweb-7977f6bf4-wp5vz": NetworkPlugin cni failed to set up pod "aweb-7977f6bf4-wp5vz_default" network: Multus: Err in tearing down failed plugins: Multus: error in invoke Delegate add - "weave-net": unable to allocate IP address: Post http://127.0.0.1:6784/ip/438593ef97dca23d5a1d4882cec5d8992e218d8c5a4b10c6c071caa041e3db3f: dial tcp 127.0.0.1:6784: connect: connection refused
  Warning  Failed                  108s                  kubelet, antidote-060  Failed to pull image "antidotelabs/antidote-web:latest": rpc error: code = Unknown desc = Error response from daemon: Get https://registry-1.docker.io/v2/: dial tcp: lookup registry-1.docker.io on [::1]:53: dial udp [::1]:53: connect: cannot assign requested address
  Warning  Failed                  108s                  kubelet, antidote-060  Error: ErrImagePull
  Normal   BackOff                 106s (x2 over 107s)   kubelet, antidote-060  Back-off pulling image "antidotelabs/antidote-web:latest"
  Warning  Failed                  106s (x2 over 107s)   kubelet, antidote-060  Error: ImagePullBackOff
  Normal   Pulling                 94s (x2 over 108s)    kubelet, antidote-060  Pulling image "antidotelabs/antidote-web:latest"
  Normal   Pulled                  88s                   kubelet, antidote-060  Successfully pulled image "antidotelabs/antidote-web:latest"
  Normal   Created                 88s                   kubelet, antidote-060  Created container aweb
  Normal   Started                 88s                   kubelet, antidote-060  Started container aweb
  Warning  Unhealthy               77s                   kubelet, antidote-060  Readiness probe failed: HTTP probe failed with statuscode: 403

also, there is:

$ kubectl logs pod/aweb-7977f6bf4-wp5vz
2020/04/13 21:15:21 [error] 10#10: *12 open() "/usr/share/nginx/html/index.html" failed (13: Permission denied), client: 10.32.0.1, server: localhost, request: "GET / HTTP/1.1", host: "10.32.0.15:80"
olberger commented 4 years ago

This looks similar to https://github.com/docker-library/docs/issues/883 even though that doesn't help much

olberger commented 4 years ago

Couldn't reproduce with a test today... maybe the container image changed in between...

Closing then

olberger commented 4 years ago

Well, it seems I've spoken too fast... the problem is back now.

I've stopped the VM that was running well, did a vagrant reload afterwards, and now the problem happens :-/

Really strange

meroupatate commented 4 years ago

I also have an issue with the aweb pod when starting antidote with selfmedicate. The aweb pod is running but is unhealthy:

vagrant@antidote-060:~$ kubectl get all -A
NAMESPACE     NAME                                            READY   STATUS    RESTARTS   AGE
default       pod/acore-5dc84fb45-lksfs                       3/3     Running   1          91s
default       pod/aweb-59999d4847-zv44s                       0/1     Running   0          91s
default       pod/jaeger-68fbc85f44-qsrqr                     1/1     Running   0          90s
default       pod/nginx-ingress-controller-7cb5547dff-nvg2x   1/1     Running   0          91s
[...]

When I look into the logs of the aweb pod, there are "Permission denied" errors:

vagrant@antidote-060:~$ kubectl logs pod/aweb-59999d4847-zv44s
2020/04/27 19:23:00 [error] 9#9: *1 open() "/usr/share/nginx/html/index.html" failed (13: Permission denied), client: 10.32.0.1, server: localhost, request: "GET / HTTP/1.1", host: "10.32.0.8:80"
10.32.0.1 - - [27/Apr/2020:19:23:00 +0000] "GET / HTTP/1.1" 403 153 "-" "kube-probe/1.14" "-"

It seems that kube-probe can't check the readiness of the pod because it can't access the index.html file, so I checked the aweb container:

root@aweb-59999d4847-zv44s:/# ls -la /usr/share/nginx/html/
total 160
drwxr-xr-x 1 root root  4096 Apr 27 19:22 .
drwxr-xr-x 1 root root  4096 Jan  9 22:20 ..
-rw-r--r-- 1 root root   494 Apr 27 19:22 50x.html
drwxr-xr-x 1 root root  4096 Apr 27 19:22 advisor
-rw-r--r-- 1 root root   399 Feb  9 05:55 antidote-config.js
drwxr-xr-x 1 root root  4096 Apr 27 19:22 catalog
drwxr-xr-x 1 root root  4096 Apr 27 19:22 collections
drwxr-xr-x 2 root root  4096 Feb  9 05:55 icons
drwxr-xr-x 2 root root  4096 Feb  9 05:55 images
-rw-r--r-- 1 root root  5266 Apr 27 19:22 index.html
drwxr-xr-x 3 root root  4096 Mar 14 06:18 js
drwxr-xr-x 1 root root  4096 Apr 27 19:22 labs
drwxr-xr-x 1 root root  4096 Apr 22 16:52 node_modules
-rw-r--r-- 1 root root 60874 Mar 14 06:14 npm-debug.log
-rw-r--r-- 1 root root 31955 Apr 22 16:52 package-lock.json
-rw-r--r-- 1 root root  1143 Apr 22 16:42 package.json
-rw-r--r-- 1 root root  1780 Apr 22 16:34 rollup.config.js
drwxr-xr-x 2 root root  4096 Apr 22 16:52 stats

I'm not sure what permissions index.html needs, but if I run chmod u+x /usr/share/nginx/html/index.html, it seems that kube-probe now considers the aweb pod as ready:

vagrant@antidote-060:~$ kubectl get all -A
NAMESPACE     NAME                                            READY   STATUS    RESTARTS   AGE
default       pod/acore-5dc84fb45-lksfs                       3/3     Running   1          22m
default       pod/aweb-59999d4847-zv44s                       1/1     Running   0          22m
default       pod/jaeger-68fbc85f44-qsrqr                     1/1     Running   0          22m
default       deployment.apps/nginx-ingress-controller   1/1     1            1           22m
[...]

However, I am still unable to access the web application and I get no response when I try to run curl localhost:30001 inside the VM :/

vagrant@antidote-060:~$ curl localhost:30001 -v
* Rebuilt URL to: localhost:30001/
*   Trying ::1...
* Connected to localhost (::1) port 30001 (#0)
> GET / HTTP/1.1
> Host: localhost:30001
> User-Agent: curl/7.47.0
> Accept: */*
> 
olberger commented 4 years ago

Port 30001 is for the access from outside the VM, i.e. on the Vagrant host. Inside the VM, it should be on port 80 IIRC, FWIW

$ kubectl get service/nginx-ingress
NAME            TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
nginx-ingress   NodePort   10.111.76.233   <none>        80:30001/TCP   94s
olberger commented 4 years ago

I'm not sure what permissions index.html needs, but if I run chmod u+x /usr/share/nginx/html/index.html, it seems that kube-probe now considers the aweb pod as ready:

chmod +x on an HTML file ?? whoah... weird... but if this works...

meroupatate commented 4 years ago

Port 30001 is for the access from outside the VM, i.e. on the Vagrant host. Inside the VM, it should be on port 80 IIRC, FWIW

$ kubectl get service/nginx-ingress
NAME            TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
nginx-ingress   NodePort   10.111.76.233   <none>        80:30001/TCP   94s

Oh I thought that it should work on port 30001 even inside the VM since the Vagrantfile forwards port 30001 of the guest to 30001 of the host :(

Anyway, I tried antidote-local:30001 outside the VM (it didn't work earlier because I didn't have the vagrant-hostsupdater plugin installed), and I can access the platform index.html well, but I still get 403 Forbidden errors when I try to access the lesson catalog :/

At this point I just ran chmod -R u+x /usr/share/nginx on the entire directory on the antidote-web container, I'm not sure why it works but it actually does the trick...