vmware-tanzu / sonobuoy

Sonobuoy is a diagnostic tool that makes it easier to understand the state of a Kubernetes cluster by running a set of Kubernetes conformance tests and other plugins in an accessible and non-destructive manner.
https://sonobuoy.io
Apache License 2.0
2.91k stars 344 forks source link

e2e failed - what can I do ? #1133

Closed Hokwang closed 4 years ago

Hokwang commented 4 years ago

Hello.

Today I run sonobuoy first time. I found e2e failed but don't know how to fix this. Can you help me?

$ tar xvfz sonobuoy_0.18.3_linux_amd64.tar.gz
LICENSE
sonobuoy
$ ./sonobuoy run --wait
INFO[0000] created object                                name=sonobuoy namespace= resource=namespaces
INFO[0000] created object                                name=sonobuoy-serviceaccount namespace=sonobuoy resource=serviceaccounts
INFO[0000] created object                                name=sonobuoy-serviceaccount-sonobuoy namespace= resource=clusterrolebindings
INFO[0000] created object                                name=sonobuoy-serviceaccount-sonobuoy namespace= resource=clusterroles
INFO[0000] created object                                name=sonobuoy-config-cm namespace=sonobuoy resource=configmaps
INFO[0000] created object                                name=sonobuoy-plugins-cm namespace=sonobuoy resource=configmaps
INFO[0001] created object                                name=sonobuoy namespace=sonobuoy resource=pods
INFO[0001] created object                                name=sonobuoy-master namespace=sonobuoy resource=services

$ results=$(./sonobuoy retrieve)
$ ./sonobuoy results $results
Plugin: e2e
Status: failed
Total: 1
Passed: 0
Failed: 1
Skipped: 0

Failed tests:
BeforeSuite

Plugin: systemd-logs
Status: passed
Total: 29
Passed: 29
Failed: 0
Skipped: 0
$ ./sonobuoy results $results --plugin e2e --mode detailed
{"name":"BeforeSuite","status":"failed","meta":{"path":"e2e|junit_01.xml|Kubernetes e2e suite"},"details":{"failure":"_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/e2e.go:62\nJun 25 05:05:05.676: Unexpected error:\n    \u003c*errors.errorString | 0xc0000cb960\u003e: {\n        s: \"timed out waiting for the condition\",\n    }\n    timed out waiting for the condition\noccurred\n_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/e2e.go:242"}}
$ ./sonobuoy results $results --plugin e2e --mode dump
name: e2e
status: failed
meta:
  type: summary
items:
- name: junit_01.xml
  status: failed
  meta:
    file: results/global/junit_01.xml
    type: file
  items:
  - name: Kubernetes e2e suite
    status: failed
    items:
    - name: BeforeSuite
      status: failed
      details:
        failure: |-
          _output/dockerized/go/src/k8s.io/kubernetes/test/e2e/e2e.go:62
          Jun 25 05:05:05.676: Unexpected error:
              <*errors.errorString | 0xc0000cb960>: {
                  s: "timed out waiting for the condition",
              }
              timed out waiting for the condition
          occurred
          _output/dockerized/go/src/k8s.io/kubernetes/test/e2e/e2e.go:242
$ ./sonobuoy results 202006250432_sonobuoy_801d7537-e685-4cb6-82b6-d0a5adb85679.tar.gz --mode=detailed | jq
{
  "name": "BeforeSuite",
  "status": "failed",
  "meta": {
    "path": "e2e|junit_01.xml|Kubernetes e2e suite"
  },
  "details": {
    "failure": "_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/e2e.go:62\nJun 25 05:05:05.676: Unexpected error:\n    <*errors.errorString | 0xc0000cb960>: {\n        s: \"timed out waiting for the condition\",\n    }\n    timed out waiting for the condition\noccurred\n_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/e2e.go:242"
  }
}
parse error: Invalid numeric literal at line 3, column 37

Environment:

zubron commented 4 years ago

The output from the tests isn't particularly helpful here, but I looked at the line where the error was triggered on the v1.17.4 branch for Kubernetes: https://github.com/kubernetes/kubernetes/blob/v1.17.4/test/e2e/e2e.go#L242

The condition that was not met for these tests was that not all the nodes were considered ready to schedule workloads on. This happens when there are nodes where the Ready condition is not true, or there are taints on the nodes. We have an answer in FAQ about this problem and how you can work around it to start your tests: https://sonobuoy.io/docs/v0.18.3/faq/#we-have-some-nodes-with-custom-taints-in-our-cluster-and-the-tests-wont-start-how-can-i-run-the-tests

Hopefully this helps you get the tests running! I am going to close this issue but if you still encounter problems, or need additional help feel free to reopen.

Hokwang commented 4 years ago

@zubron when I use ./sonobuoy run --wait --plugin-env=e2e.E2E_EXTRA_ARGS="--non-blocking-taints=node-role.kubernetes.io/master,<my_taint>" command, this resolve my problem, thanks.

and here's one another question, now I have another error message. I think that this is http proxy problem cause I am behind the corporate proxy.

$ ./sonobuoy results $results
Plugin: e2e
Status: failed
Total: 4842
Passed: 258
Failed: 20
Skipped: 4564

Failed tests:
[sig-api-machinery] AdmissionWebhook [Privileged:ClusterAdmin] should mutate custom resource with pruning [Conformance]
[sig-api-machinery] AdmissionWebhook [Privileged:ClusterAdmin] should mutate configmap [Conformance]
[sig-api-machinery] AdmissionWebhook [Privileged:ClusterAdmin] should deny crd creation [Conformance]
[sig-api-machinery] AdmissionWebhook [Privileged:ClusterAdmin] should be able to deny pod and configmap creation [Conformance]
[sig-api-machinery] CustomResourceConversionWebhook [Privileged:ClusterAdmin] should be able to convert a non homogeneous list of CRs [Conformance]
[sig-api-machinery] AdmissionWebhook [Privileged:ClusterAdmin] should not be able to mutate or prevent deletion of webhook configuration objects [Conformance]
[sig-api-machinery] AdmissionWebhook [Privileged:ClusterAdmin] should mutate pod and apply defaults after mutation [Conformance]
[sig-api-machinery] AdmissionWebhook [Privileged:ClusterAdmin] should unconditionally reject operations on fail closed webhook [Conformance]
[sig-api-machinery] AdmissionWebhook [Privileged:ClusterAdmin] should be able to deny custom resource creation, update and deletion [Conformance]
[sig-api-machinery] AdmissionWebhook [Privileged:ClusterAdmin] should mutate custom resource with different stored version [Conformance]
[sig-api-machinery] AdmissionWebhook [Privileged:ClusterAdmin] listing mutating webhooks should work [Conformance]
[sig-api-machinery] CustomResourceConversionWebhook [Privileged:ClusterAdmin] should be able to convert from CR v1 to CR v2 [Conformance]
[sig-api-machinery] AdmissionWebhook [Privileged:ClusterAdmin] should mutate custom resource [Conformance]
[sig-api-machinery] AdmissionWebhook [Privileged:ClusterAdmin] should honor timeout [Conformance]
[sig-api-machinery] AdmissionWebhook [Privileged:ClusterAdmin] should be able to deny attaching pod [Conformance]
[sig-network] Networking Granular Checks: Pods should function for intra-pod communication: udp [LinuxOnly] [NodeConformance] [Conformance]
[sig-apps] Daemon set [Serial] should run and stop simple daemon [Conformance]
[sig-api-machinery] AdmissionWebhook [Privileged:ClusterAdmin] listing validating webhooks should work [Conformance]
[sig-api-machinery] AdmissionWebhook [Privileged:ClusterAdmin] patching/updating a validating webhook should work [Conformance]
[sig-api-machinery] AdmissionWebhook [Privileged:ClusterAdmin] patching/updating a mutating webhook should work [Conformance]

Plugin: systemd-logs
Status: passed
Total: 29
Passed: 29
Failed: 0
Skipped: 0

I am using cilium for CNI, is this related?

zubron commented 4 years ago

Hi @Hokwang. To determine the cause of the test failures, you will need to look at the logs from the tests as sonobuoy results just outputs the name of the failed tests with no further details.

Please see our guide in the FAQ on how to debug test failures: https://sonobuoy.io/docs/v0.18.3/faq/#how-do-i-determine-why-my-tests-failed

Hokwang commented 4 years ago

@zubron I found https://github.com/vmware-tanzu/sonobuoy/issues/1084, so I said cilium.

I don't know what is the problem..

$ ./sonobuoy results $results --mode detailed | jq '. | select(.status == "failed") | .details'
{
  "failure": "/workspace/anago-v1.17.4-beta.0.54+12bf0cb73007af/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:721\nJul 22 07:08:41.407: waiting for webhook configuration to be ready\nUnexpected error:\n    <*errors.errorString | 0xc0000cb960>: {\n        s: \"timed out waiting for the condition\",\n    }\n    timed out waiting for the condition\noccurred\n/workspace/anago-v1.17.4-beta.0.54+12bf0cb73007af/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/apimachinery/webhook.go:1865",
  "system-out": "[BeforeEach] [sig-api-machinery] AdmissionWebhook [Privileged:ClusterAdmin]\n  /workspace/anago-v1.17.4-beta.0.54+12bf0cb73007af/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:151\nSTEP: Creating a kubernetes client\nJul 22 07:08:05.700: INFO: >>> kubeConfig: /tmp/kubeconfig-616749498\nSTEP: Building a namespace api object, basename webhook\nSTEP: Binding the e2e-test-privileged-psp PodSecurityPolicy to the default service account in webhook-2186\nSTEP: Waiting for a default service account to be provisioned in namespace\n[BeforeEach] [sig-api-machinery] AdmissionWebhook [Privileged:ClusterAdmin]\n  /workspace/anago-v1.17.4-beta.0.54+12bf0cb73007af/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/apimachinery/webhook.go:87\nSTEP: Setting up server cert\nSTEP: Create role binding to let webhook read extension-apiserver-authentication\nSTEP: Deploying the webhook pod\nSTEP: Wait for the deployment to be ready\nJul 22 07:08:06.077: INFO: deployment \"sample-webhook-deployment\" doesn't have the required revision set\nJul 22 07:08:08.087: INFO: deployment status: v1.DeploymentStatus{ObservedGeneration:1, Replicas:1, UpdatedReplicas:1, ReadyReplicas:0, AvailableReplicas:0, UnavailableReplicas:1, Conditions:[]v1.DeploymentCondition{v1.DeploymentCondition{Type:\"Available\", Status:\"False\", LastUpdateTime:v1.Time{Time:time.Time{wall:0x0, ext:63730998486, loc:(*time.Location)(0x791d1c0)}}, LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63730998486, loc:(*time.Location)(0x791d1c0)}}, Reason:\"MinimumReplicasUnavailable\", Message:\"Deployment does not have minimum availability.\"}, v1.DeploymentCondition{Type:\"Progressing\", Status:\"True\", LastUpdateTime:v1.Time{Time:time.Time{wall:0x0, ext:63730998486, loc:(*time.Location)(0x791d1c0)}}, LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63730998486, loc:(*time.Location)(0x791d1c0)}}, Reason:\"ReplicaSetUpdated\", Message:\"ReplicaSet \\\"sample-webhook-deployment-5f65f8c764\\\" is progressing.\"}}, CollisionCount:(*int32)(nil)}\nSTEP: Deploying the webhook service\nSTEP: Verifying the service has paired with the endpoint\nJul 22 07:08:11.105: INFO: Waiting for amount of service:e2e-test-webhook endpoints to be 1\n[It] should mutate custom resource with pruning [Conformance]\n  /workspace/anago-v1.17.4-beta.0.54+12bf0cb73007af/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:721\nJul 22 07:08:11.108: INFO: >>> kubeConfig: /tmp/kubeconfig-616749498\nSTEP: Registering the mutating webhook for custom resource e2e-test-webhook-7051-crds.webhook.example.com via the AdmissionRegistration API\nJul 22 07:08:11.176: INFO: Waiting for webhook configuration to be ready...\nJul 22 07:08:11.462: INFO: Waiting for webhook configuration to be ready...\nJul 22 07:08:11.513: INFO: Waiting for webhook configuration to be ready...\nJul 22 07:08:11.637: INFO: Waiting for webhook configuration to be ready...\nJul 22 07:08:11.727: INFO: Waiting for webhook configuration to be ready...\nJul 22 07:08:11.820: INFO: Waiting for webhook configuration to be ready...
<snip>
zubron commented 4 years ago

Hi @Hokwang. It's possible that the CNI is related, however I couldn't say that for sure as the Sonobuoy team are not the authors of these tests.

I would try to run just one of the failing tests again (e.g. using --e2e-focus="should mutate custom resource with pruning") and carefully examine the logs. Looking at the snippet you've sent it is this test that is failing when registering the webhook.

While the test is running, you could inspect some of the resources it's creating to try and determine why they don't become ready. Most of the tests have a fairly generous timeout value so you should have time to see and inspect the resources that they attempt to create.

BadigerAnu commented 3 years ago

@Hokwang Can you please share the sample of the command ? I am passing the taint , still I get same error "Before Suite". How exactly to pass the taint, is it like for example, ./sonobuoy run --wait --plugin-env=e2e.E2E_EXTRA_ARGS="--non-blocking-taints=node-role.kubernetes.io/master, xyz:NoSchedule" or ./sonobuoy run --wait --plugin-env=e2e.E2E_EXTRA_ARGS="--non-blocking-taints=node-role.kubernetes.io/master, workloadType=xyz:NoSchedule"

kumiDa commented 1 year ago

@BadigerAnu, any luck finding the right format to pass the taints to the --non-blocking-taints option?

@zubron & @johnSchnake, the format to pass the --non-blocking-taints is to be elaborated in the FAQ because the excerpt there doesn't actually give an idea about how to pass the taints.