vmware-tanzu / sonobuoy

Sonobuoy is a diagnostic tool that makes it easier to understand the state of a Kubernetes cluster by running a set of Kubernetes conformance tests and other plugins in an accessible and non-destructive manner.
https://sonobuoy.io
Apache License 2.0
2.9k stars 343 forks source link

Scanner doesn't detect any results #159

Closed seanknox closed 6 years ago

seanknox commented 6 years ago

I provisioned a v1.7.9 cluster with RBAC enabled with acs-engine running on Azure. There are no cluster-egress or network policy rules applied. After running Sonobuoy a number of times the tests appear to complete, but scanner.heptio.com never receives the results:

$ stern -n heptio-sonobuoy "sonobuoy$"
+ sonobuoy › forwarder
+ sonobuoy › cleanup
+ sonobuoy › kube-sonobuoy
sonobuoy kube-sonobuoy time="2017-11-22T20:06:55Z" level=info msg="Scanning plugins in ./plugins.d (pwd: /)"
sonobuoy kube-sonobuoy time="2017-11-22T20:06:55Z" level=warning msg="Unsupported plugin file detected ..119811_22_11_20_06_53.150305025"
sonobuoy kube-sonobuoy time="2017-11-22T20:06:55Z" level=warning msg="Unsupported plugin file detected ..data"
sonobuoy kube-sonobuoy time="2017-11-22T20:06:55Z" level=info msg="Scanning plugins in /etc/sonobuoy/plugins.d (pwd: /)"
sonobuoy kube-sonobuoy time="2017-11-22T20:06:55Z" level=info msg="Directory (/etc/sonobuoy/plugins.d) does not exist"
sonobuoy kube-sonobuoy time="2017-11-22T20:06:55Z" level=info msg="Scanning plugins in ~/sonobuoy/plugins.d (pwd: /)"
sonobuoy kube-sonobuoy time="2017-11-22T20:06:55Z" level=info msg="Directory (~/sonobuoy/plugins.d) does not exist"
sonobuoy kube-sonobuoy time="2017-11-22T20:06:55Z" level=info msg="Loading plugin driver Job"
sonobuoy kube-sonobuoy time="2017-11-22T20:06:56Z" level=info msg="Namespace default Matched=true"
sonobuoy kube-sonobuoy time="2017-11-22T20:06:56Z" level=info msg="Namespace heptio-sonobuoy Matched=true"
sonobuoy kube-sonobuoy time="2017-11-22T20:06:56Z" level=info msg="Namespace kube-public Matched=true"
sonobuoy kube-sonobuoy time="2017-11-22T20:06:56Z" level=info msg="Namespace kube-system Matched=true"
sonobuoy kube-sonobuoy time="2017-11-22T20:06:56Z" level=info msg="Starting server Expected Results: [{ e2e}]"
sonobuoy kube-sonobuoy time="2017-11-22T20:06:56Z" level=info msg="Running (e2e) plugin"
sonobuoy kube-sonobuoy time="2017-11-22T20:06:56Z" level=info msg="Listening for incoming results on 0.0.0.0:8080\n"
sonobuoy forwarder 2017/11/22 20:06:56 Cloud: https://scanner.heptio.com
sonobuoy kube-sonobuoy time="2017-11-22T21:36:56Z" level=error msg="error running plugins: timed out waiting for plugins, shutting down HTTP server"
sonobuoy kube-sonobuoy time="2017-11-22T21:36:56Z" level=info msg="Running non-ns query"
sonobuoy kube-sonobuoy time="2017-11-22T21:36:57Z" level=info msg="Collecting Node Configuration and Health..."
sonobuoy kube-sonobuoy time="2017-11-22T21:36:57Z" level=info msg="Creating host results for k8s-agent-41021881-0 under /tmp/sonobuoy/1ca8fc2d-6f33-423f-8422-fd1e3e4b00fe/hosts/k8s-agent-41021881-0\n"
sonobuoy kube-sonobuoy time="2017-11-22T21:36:57Z" level=info msg="Creating host results for k8s-agent-41021881-1 under /tmp/sonobuoy/1ca8fc2d-6f33-423f-8422-fd1e3e4b00fe/hosts/k8s-agent-41021881-1\n"
sonobuoy kube-sonobuoy time="2017-11-22T21:36:58Z" level=info msg="Creating host results for k8s-agent-41021881-2 under /tmp/sonobuoy/1ca8fc2d-6f33-423f-8422-fd1e3e4b00fe/hosts/k8s-agent-41021881-2\n"
sonobuoy kube-sonobuoy time="2017-11-22T21:36:58Z" level=info msg="Creating host results for k8s-master-41021881-0 under /tmp/sonobuoy/1ca8fc2d-6f33-423f-8422-fd1e3e4b00fe/hosts/k8s-master-41021881-0\n"
sonobuoy kube-sonobuoy time="2017-11-22T21:36:58Z" level=info msg="Creating host results for k8s-master-41021881-1 under /tmp/sonobuoy/1ca8fc2d-6f33-423f-8422-fd1e3e4b00fe/hosts/k8s-master-41021881-1\n"
sonobuoy kube-sonobuoy time="2017-11-22T21:36:59Z" level=info msg="Creating host results for k8s-master-41021881-2 under /tmp/sonobuoy/1ca8fc2d-6f33-423f-8422-fd1e3e4b00fe/hosts/k8s-master-41021881-2\n"
sonobuoy kube-sonobuoy time="2017-11-22T21:36:59Z" level=info msg="Running ns query (default)"
sonobuoy kube-sonobuoy time="2017-11-22T21:36:59Z" level=info msg="Running ns query (heptio-sonobuoy)"
sonobuoy kube-sonobuoy time="2017-11-22T21:36:59Z" level=info msg="Running ns query (kube-public)"
sonobuoy kube-sonobuoy time="2017-11-22T21:36:59Z" level=info msg="Running ns query (kube-system)"
sonobuoy kube-sonobuoy time="2017-11-22T21:37:00Z" level=info msg="Results available at /tmp/sonobuoy/201711222006_sonobuoy_1ca8fc2d-6f33-423f-8422-fd1e3e4b00fe.tar.gz"

Where can I find more information about why outbound results aren't reaching Scanner?

timothysc commented 6 years ago

@seanknox take a look at the pod logs for the forwarder pod.

seanknox commented 6 years ago

@timothysc are you referring to the forwarder container in the heptio-sonobuoy pod? Above are all (nearly) logs from all heptio-sonobuoy containers.

timothysc commented 6 years ago

The forwarder should be dialing back to the instance id, and there should be a message that it detected a 'done' sentinel file.

timothysc commented 6 years ago

/cc @chuckha

chuckha commented 6 years ago

@seanknox do you have the scanner ID handy?

seanknox commented 6 years ago

@chuckha either of these scans:

chuckha commented 6 years ago

@seanknox The problem here is that the e2es timed out. You can see it from this log line buried in the middle:

sonobuoy kube-sonobuoy time="2017-11-22T21:36:56Z" level=error msg="error running plugins: timed out waiting for plugins, shutting down HTTP server"

Perhaps something got wedged? If you take a look at all namespaces after a failed test run sometimes you'll see some e2e namespaces that are still hanging around that don't get cleaned up properly which can cause tests behave in strange ways and maybe hang.

I'm not sure what else to suggest with regards to timeout, they really should be finishing in our timeout period of 90 minutes. @timothysc any other suggestions for diagnosing timeouts?

timothysc commented 6 years ago

We just added default log gathering in v0.10.0 so if it wedges you should see it in the e2e pod logs.

seanknox commented 6 years ago

Thanks @chuckha @timothysc. Should I try another run with Scanner, or should I try running Sonobuoy v0.10.0 in my cluster directly?

timothysc commented 6 years ago

@seanknox I would search the e2e-pod logs for the details. It's likely wedged on a test.

seanknox commented 6 years ago

Hey folks, I ran Scanner again on a new cluster, built to the same specs and it worked this time. ¯\_(ツ)_/¯

bitfield commented 6 years ago

Same thing here with a 'Kubernetes The Hard Way' cluster on GCE (scan id 14c085c8703ea68bb0d5a768da992726).

This is the last output from the e2e pod:

SSSFeb  9 14:38:46.074: INFO: Running AfterSuite actions on all node
Feb  9 14:38:46.074: INFO: Running AfterSuite actions on node 1
Feb  9 14:38:46.074: INFO: Dumping logs locally to: /tmp/results
Checking for custom logdump instances, if any
Sourcing kube-util.sh
/kubernetes/cluster/log-dump/../../cluster/../cluster/gce/../../cluster/gce/config-test.sh: line 94: USER: unbound variable
Feb  9 14:38:46.106: INFO: Error running cluster/log-dump/log-dump.sh: exit status 1

Ran 125 of 710 Specs in 3037.869 seconds
SUCCESS! -- 125 Passed | 0 Failed | 0 Pending | 585 Skipped PASS

The web page never updates, presumably because the forwarder is still waiting:

$ k logs sonobuoy forwarder
time="2018-02-09T13:47:52Z" level=info msg="forwarder information" Scanner ID=14c085c8703ea68bb0d5a768da992726 Scanner URL="https://scanner.heptio.com"
time="2018-02-09T13:47:52Z" level=info msg="waiting for a done file to appear..." looking for=/tmp/sonobuoy/done