linuxkit / kubernetes

minimal and immutable Kubernetes images built with LinuxKit
Apache License 2.0
400 stars 75 forks source link

Flakey rtf tests #64

Open ijc opened 6 years ago

ijc commented 6 years ago

I ran 100 iterations of rtf test on commit cc58ae93ccbfe8f4acdbb209394fe8af3d06bede. Results were:

Total Iterations: 100

Failures:
      3 FAIL await kube-dns ready (timeout)
      7 FAIL intra-pod networking (timeout)

kubernetes.smoke.cri-bridge.log
    [STDOUT  ] 2018-02-14T10:57:40.601415418Z: FAIL await kube-dns ready (timeout)$

kubernetes.smoke.cri-weave.log
    [STDOUT  ] 2018-02-13T19:21:07.417410968Z: FAIL await kube-dns ready (timeout)$
    [STDOUT  ] 2018-02-14T15:30:28.524790977Z: linuxkit-3e7dec682e01:/# ^[[6nFAIL await kube-dns ready (timeout)$

kubernetes.smoke.docker-bridge.log
    [STDOUT  ] 2018-02-13T16:39:28.684607810Z: FAIL intra-pod networking (timeout)$
    [STDOUT  ] 2018-02-14T00:37:21.189979194Z: FAIL intra-pod networking (timeout)$
    [STDOUT  ] 2018-02-14T02:53:00.333052128Z: FAIL intra-pod networking (timeout)$
    [STDOUT  ] 2018-02-14T04:13:15.442319709Z: FAIL intra-pod networking (timeout)$
    [STDOUT  ] 2018-02-14T06:18:38.001665601Z: FAIL intra-pod networking (timeout)$
    [STDOUT  ] 2018-02-14T06:33:31.060937576Z: FAIL intra-pod networking (timeout)$
    [STDOUT  ] 2018-02-14T06:48:24.444592701Z: FAIL intra-pod networking (timeout)$

The await kube-dns ready failures (which appear cri specific, although that might just be a timing thing) all follow the pattern:

[STDOUT  ] 2018-02-13T19:16:07.972630880Z: kubectl --namespace=kube-system get --selector='k8s-app
[STDOUT  ] 2018-02-13T19:16:07.972653156Z: '='kube-dns' -o jsonpath='{.items[*].status.phase}' pods ; echo
[STDOUT  ] 2018-02-13T19:16:08.124612578Z: Pending
[STDOUT  ] 2018-02-13T19:16:09.131521270Z: linuxkit-925ccd792946:/# ESC[6nkubectl --namespace=kube-system get --selector='k8s-app
[STDOUT  ] 2018-02-13T19:16:09.131669920Z: '='kube-dns' -o jsonpath='{.items[*].status.phase}' pods ; echo
[STDOUT  ] 2018-02-13T19:16:09.287245054Z: Pending
[STDOUT  ] 2018-02-13T19:16:10.288787244Z: linuxkit-925ccd792946:/# ESC[6nkubectl --namespace=kube-system get --selector='k8s-app
[STDOUT  ] 2018-02-13T19:16:10.289749277Z: '='kube-dns' -o jsonpath='{.items[*].status.phase}' pods ; echo
[STDOUT  ] 2018-02-13T19:16:10.450560763Z: Pending
[STDOUT  ] 2018-02-13T19:16:11.451770429Z: linuxkit-925ccd792946:/# ESC[6nkubectl --namespace=kube-system get --selector='k8s-app
[STDOUT  ] 2018-02-13T19:16:11.451812803Z: '='kube-dns' -o jsonpath='{.items[*].status.phase}' pods ; echo
[STDOUT  ] 2018-02-13T19:16:11.595824096Z: Pending
[STDOUT  ] 2018-02-13T19:16:12.599325210Z: linuxkit-925ccd792946:/# ESC[6nkubectl --namespace=kube-system get --selector='k8s-app
[STDOUT  ] 2018-02-13T19:16:12.599444260Z: '='kube-dns' -o jsonpath='{.items[*].status.phase}' pods ; echo
[STDOUT  ] 2018-02-13T19:16:12.767129133Z: Pending
[STDOUT  ] 2018-02-13T19:16:13.773401297Z: linuxkit-925ccd792946:/# ESC[6nkubectl --namespace=kube-system get --selector='k8s-app
[STDOUT  ] 2018-02-13T19:16:13.773562433Z: '='kube-dns' -o jsonpath='{.items[*].status.phase}' pods ; echo
[STDOUT  ] 2018-02-13T19:16:13.944594902Z: Pending
[STDOUT  ] 2018-02-13T19:16:14.950696402Z: linuxkit-925ccd792946:/# ESC[6nkubectl --namespace=kube-system get --selector='k8s-app
[STDOUT  ] 2018-02-13T19:16:14.955823955Z: '='kube-dns' -o jsonpath='{.items[*].status.phase}' pods ; echo
[STDOUT  ] 2018-02-13T19:16:15.120279903Z: Pending
[STDOUT  ] 2018-02-13T19:16:16.125945435Z: linuxkit-925ccd792946:/# ESC[6nkubectl --namespace=kube-system get --selector='k8s-app
[STDOUT  ] 2018-02-13T19:16:16.126079752Z: '='kube-dns' -o jsonpath='{.items[*].status.phase}' pods ; echo
[STDOUT  ] 2018-02-13T19:16:16.329709531Z: Pending
[STDOUT  ] 2018-02-13T19:16:17.331563723Z: linuxkit-925ccd792946:/# ESC[6nkubectl --namespace=kube-system get --selector='k8s-app
[STDOUT  ] 2018-02-13T19:16:17.331616545Z: '='kube-dns' -o jsonpath='{.items[*].status.phase}' pods ; echo
[STDOUT  ] 2018-02-13T19:16:17.520332757Z: Pending
[STDOUT  ] 2018-02-13T19:16:18.521722092Z: linuxkit-925ccd792946:/# ESC[6nkubectl --namespace=kube-system get --selector='k8s-app
[STDOUT  ] 2018-02-13T19:16:18.521760301Z: '='kube-dns' -o jsonpath='{.items[*].status.phase}' pods ; echo
[STDOUT  ] 2018-02-13T19:16:18.678583094Z: Pending
[STDOUT  ] 2018-02-13T19:16:19.680123494Z: linuxkit-925ccd792946:/# ESC[6nkubectl --namespace=kube-system get --selector='k8s-app
[STDOUT  ] 2018-02-13T19:16:19.680166132Z: '='kube-dns' -o jsonpath='{.items[*].status.phase}' pods ; echo
[STDOUT  ] 2018-02-13T19:16:19.835904177Z: Pending
[STDOUT  ] 2018-02-13T19:16:20.839888363Z: linuxkit-925ccd792946:/# ESC[6nkubectl --namespace=kube-system get --selector='k8s-app
[STDOUT  ] 2018-02-13T19:16:20.839992693Z: '='kube-dns' -o jsonpath='{.items[*].status.phase}' pods ; echo
[STDOUT  ] 2018-02-13T19:16:20.995085155Z: Pending
[STDOUT  ] 2018-02-13T19:16:22.000421241Z: linuxkit-925ccd792946:/# ESC[6nkubectl --namespace=kube-system get --selector='k8s-app
[STDOUT  ] 2018-02-13T19:16:22.000592879Z: '='kube-dns' -o jsonpath='{.items[*].status.phase}' pods ; echo
[STDOUT  ] 2018-02-13T19:16:22.195642693Z: Pending
[STDOUT  ] 2018-02-13T19:16:23.216196491Z: linuxkit-925ccd792946:/# ESC[6nkubectl --namespace=kube-system get --selector='k8s-app
[STDOUT  ] 2018-02-13T19:16:23.222987654Z: '='kube-dns' -o jsonpath='{.items[*].status.phase}' pods ; echo
[STDOUT  ] 2018-02-13T19:16:23.416875245Z: Pending
[STDOUT  ] 2018-02-13T19:21:07.417410968Z: FAIL await kube-dns ready (timeout)

That is around a dozen iterations over 40-80s and then silence until the overall timeout after 300s. I suspect this is a test case issue.

The intra-pod networking failure which appears docker-bridge specific is the same in every case too, that is it is actually failing to install curl:

[STDOUT  ] 2018-02-14T06:17:27.383724129Z: SUCCESS nginx responded well
[STDOUT  ] 2018-02-14T06:17:27.387994374Z: kubectl exec $(kubectl get pods -l name==alpine -o=json
[STDOUT  ] 2018-02-14T06:17:27.390734315Z: path='{.items[*].metadata.name}') -- apk add --update curl
[STDOUT  ] 2018-02-14T06:17:27.959741604Z: fetch http://dl-cdn.alpinelinux.org/alpine/v3.7/main/x86_64/APKINDEX.tar.gz
[STDOUT  ] 2018-02-14T06:17:32.967752636Z: fetch http://dl-cdn.alpinelinux.org/alpine/v3.7/community/x86_64/APKINDEX.tar.gz
[STDOUT  ] 2018-02-14T06:17:32.967932428Z: ERROR: http://dl-cdn.alpinelinux.org/alpine/v3.7/main: temporary error (try again later)
[STDOUT  ] 2018-02-14T06:17:32.969122634Z: WARNING: Ignoring APKINDEX.70c88391.tar.gz: No such file or directory
[STDOUT  ] 2018-02-14T06:17:37.975192705Z: ERROR: http://dl-cdn.alpinelinux.org/alpine/v3.7/community: temporary error (try again later)
[STDOUT  ] 2018-02-14T06:17:37.975362186Z: WARNING: Ignoring APKINDEX.5022a8a2.tar.gz: No such file or directory
[STDOUT  ] 2018-02-14T06:17:37.978083230Z: ERROR: unsatisfiable constraints:
[STDOUT  ] 2018-02-14T06:17:37.984197706Z:   curl (missing):
[STDOUT  ] 2018-02-14T06:17:37.984328853Z:     required by: world[curl]
[STDOUT  ] 2018-02-14T06:17:38.020854558Z: command terminated with exit code 1

This isn't caught so it then does lots of

[STDOUT  ] 2018-02-14T06:17:38.412079312Z: OCI runtime exec failed: exec failed: container_linux.go:296: starting container process caused "exec: \"curl\": executable file not found in $PATH": unknown
[STDOUT  ] 2018-02-14T06:17:38.413279545Z: command terminated with exit code 126

before timing out.

results-cc58ae93ccbfe8f4acdbb209394fe8af3d06bede.zip

ijc commented 6 years ago

For the kube-dns issue the distribution of iterations for success is potentially interesting:

[go:LinuxKit] ijc@bokrug:test$ sed -n -E -e 's/.*(SUCCESS await kube-dns ready..after)/\1/g'p results-cc58ae93ccbfe8f4acdbb209394fe8af3d06bede.0/*/k*cri-bridge*.log | sort -k6 -n | uniq -c
      2 SUCCESS await kube-dns ready (after 19 attempts)
      9 SUCCESS await kube-dns ready (after 20 attempts)
     37 SUCCESS await kube-dns ready (after 21 attempts)
     35 SUCCESS await kube-dns ready (after 22 attempts)
     13 SUCCESS await kube-dns ready (after 23 attempts)
      1 SUCCESS await kube-dns ready (after 24 attempts)
      1 SUCCESS await kube-dns ready (after 50 attempts)
      1 SUCCESS await kube-dns ready (after 92 attempts)
[go:LinuxKit] ijc@bokrug:test$ sed -n -E -e 's/.*(SUCCESS await kube-dns ready..after)/\1/g'p results-cc58ae93ccbfe8f4acdbb209394fe8af3d06bede.0/*/k*cri-weave*.log | sort -k6 -n | uniq -c
      2 SUCCESS await kube-dns ready (after 9 attempts)
      5 SUCCESS await kube-dns ready (after 10 attempts)
      7 SUCCESS await kube-dns ready (after 11 attempts)
      9 SUCCESS await kube-dns ready (after 12 attempts)
      5 SUCCESS await kube-dns ready (after 13 attempts)
      8 SUCCESS await kube-dns ready (after 14 attempts)
      3 SUCCESS await kube-dns ready (after 15 attempts)
      5 SUCCESS await kube-dns ready (after 16 attempts)
      6 SUCCESS await kube-dns ready (after 17 attempts)
      5 SUCCESS await kube-dns ready (after 18 attempts)
      8 SUCCESS await kube-dns ready (after 19 attempts)
      5 SUCCESS await kube-dns ready (after 20 attempts)
      9 SUCCESS await kube-dns ready (after 21 attempts)
      7 SUCCESS await kube-dns ready (after 22 attempts)
      9 SUCCESS await kube-dns ready (after 23 attempts)
      4 SUCCESS await kube-dns ready (after 24 attempts)
      1 SUCCESS await kube-dns ready (after 36 attempts)
[go:LinuxKit] ijc@bokrug:test$ sed -n -E -e 's/.*(SUCCESS await kube-dns ready..after)/\1/g'p results-cc58ae93ccbfe8f4acdbb209394fe8af3d06bede.0/*/k*docker-bridge*.log | sort -k6 -n | uniq -c
      3 SUCCESS await kube-dns ready (after 6 attempts)
     48 SUCCESS await kube-dns ready (after 7 attempts)
     38 SUCCESS await kube-dns ready (after 8 attempts)
     10 SUCCESS await kube-dns ready (after 9 attempts)
      1 SUCCESS await kube-dns ready (after 10 attempts)
[go:LinuxKit] ijc@bokrug:test$ sed -n -E -e 's/.*(SUCCESS await kube-dns ready..after)/\1/g'p results-cc58ae93ccbfe8f4acdbb209394fe8af3d06bede.0/*/k*docker-weave*.log | sort -k6 -n | uniq -c
      1 SUCCESS await kube-dns ready (after 2 attempts)
      2 SUCCESS await kube-dns ready (after 5 attempts)
      1 SUCCESS await kube-dns ready (after 6 attempts)
      4 SUCCESS await kube-dns ready (after 7 attempts)
      1 SUCCESS await kube-dns ready (after 8 attempts)
      3 SUCCESS await kube-dns ready (after 11 attempts)
      7 SUCCESS await kube-dns ready (after 12 attempts)
     14 SUCCESS await kube-dns ready (after 13 attempts)
     12 SUCCESS await kube-dns ready (after 14 attempts)
     18 SUCCESS await kube-dns ready (after 15 attempts)
     18 SUCCESS await kube-dns ready (after 16 attempts)
     15 SUCCESS await kube-dns ready (after 17 attempts)
      3 SUCCESS await kube-dns ready (after 18 attempts)
      1 SUCCESS await kube-dns ready (after 20 attempts)