Closed seanknox closed 6 years ago
@seanknox take a look at the pod logs for the forwarder pod.
@timothysc are you referring to the forwarder container in the heptio-sonobuoy
pod? Above are all (nearly) logs from all heptio-sonobuoy
containers.
The forwarder should be dialing back to the instance id, and there should be a message that it detected a 'done' sentinel file.
/cc @chuckha
@seanknox do you have the scanner ID handy?
@chuckha either of these scans:
@seanknox The problem here is that the e2es timed out. You can see it from this log line buried in the middle:
sonobuoy kube-sonobuoy time="2017-11-22T21:36:56Z" level=error msg="error running plugins: timed out waiting for plugins, shutting down HTTP server"
Perhaps something got wedged? If you take a look at all namespaces after a failed test run sometimes you'll see some e2e namespaces that are still hanging around that don't get cleaned up properly which can cause tests behave in strange ways and maybe hang.
I'm not sure what else to suggest with regards to timeout, they really should be finishing in our timeout period of 90 minutes. @timothysc any other suggestions for diagnosing timeouts?
We just added default log gathering in v0.10.0 so if it wedges you should see it in the e2e pod logs.
Thanks @chuckha @timothysc. Should I try another run with Scanner, or should I try running Sonobuoy v0.10.0 in my cluster directly?
@seanknox I would search the e2e-pod logs for the details. It's likely wedged on a test.
Hey folks, I ran Scanner again on a new cluster, built to the same specs and it worked this time. ¯\_(ツ)_/¯
Same thing here with a 'Kubernetes The Hard Way' cluster on GCE (scan id 14c085c8703ea68bb0d5a768da992726).
This is the last output from the e2e
pod:
SSSFeb 9 14:38:46.074: INFO: Running AfterSuite actions on all node
Feb 9 14:38:46.074: INFO: Running AfterSuite actions on node 1
Feb 9 14:38:46.074: INFO: Dumping logs locally to: /tmp/results
Checking for custom logdump instances, if any
Sourcing kube-util.sh
/kubernetes/cluster/log-dump/../../cluster/../cluster/gce/../../cluster/gce/config-test.sh: line 94: USER: unbound variable
Feb 9 14:38:46.106: INFO: Error running cluster/log-dump/log-dump.sh: exit status 1
Ran 125 of 710 Specs in 3037.869 seconds
SUCCESS! -- 125 Passed | 0 Failed | 0 Pending | 585 Skipped PASS
The web page never updates, presumably because the forwarder is still waiting:
$ k logs sonobuoy forwarder
time="2018-02-09T13:47:52Z" level=info msg="forwarder information" Scanner ID=14c085c8703ea68bb0d5a768da992726 Scanner URL="https://scanner.heptio.com"
time="2018-02-09T13:47:52Z" level=info msg="waiting for a done file to appear..." looking for=/tmp/sonobuoy/done
I provisioned a v1.7.9 cluster with RBAC enabled with acs-engine running on Azure. There are no cluster-egress or network policy rules applied. After running Sonobuoy a number of times the tests appear to complete, but scanner.heptio.com never receives the results:
Where can I find more information about why outbound results aren't reaching Scanner?