jetstack / navigator

Managed Database-as-a-Service (DBaaS) on Kubernetes
Apache License 2.0
271 stars 31 forks source link

Intermittent TEST FAILURE: Elasticsearch pilot did not update the document count #185

Closed wallrj closed 6 years ago

wallrj commented 6 years ago

In https://github.com/jetstack/navigator/pull/153 the E2E test for ES document count status update sometimes fails.

@munnerz I think you looked into this last year and concluded that the test just needs a little longer.

https://jetstack-build-infra.appspot.com/build/jetstack-logs/pr-logs/pull/jetstack_navigator/153/navigator-e2e-v1-8/446/

...
W0109 11:05:13.644] + stdout_gt 0 kubectl --namespace test-elasticsearchcluster-1515495611-2217 get pilot es-test-mixed-0 '-o=go-template={{.status.elasticsearch.documents}}'
W0109 11:05:13.644] + local expected=0
W0109 11:05:13.645] + shift
W0109 11:05:13.646] ++ kubectl --namespace test-elasticsearchcluster-1515495611-2217 get pilot es-test-mixed-0 '-o=go-template={{.status.elasticsearch.documents}}'
I0109 11:05:16.929] <no value> is not a number
I0109 11:05:16.933] TEST FAILURE: Elasticsearch pilot did not update the document count
W0109 11:05:17.034] + local 'actual=<no value>'
W0109 11:05:17.034] + re='^[0-9]+$'
W0109 11:05:17.035] + [[ <no value> =~ ^[0-9]+$ ]]
W0109 11:05:17.035] + echo '<no value> is not a number'
W0109 11:05:17.035] + return 1
W0109 11:05:17.035] ++ date +%s
W0109 11:05:17.035] + local current_time=1515495916
W0109 11:05:17.035] + local remaining_time=-3
W0109 11:05:17.035] + [[ -3 -lt 0 ]]
W0109 11:05:17.035] + return 1
W0109 11:05:17.035] + fail_test 'Elasticsearch pilot did not update the document count'
W0109 11:05:17.035] + FAILURE_COUNT=1
W0109 11:05:17.035] + echo 'TEST FAILURE: Elasticsearch pilot did not update the document count'
W0109 11:05:17.035] + '[' 1 -gt 0 ']'
W0109 11:05:17.035] + fail_and_exit test-elasticsearchcluster-1515495611-2217

/kind bug

munnerz commented 6 years ago

Yep I'm fairly sure this is due to CPU contention on our testing infra. I've updated the CPU request for our navigator jobs (more than double what it was): https://github.com/jetstack/test-infra/pull/100

Hopefully that should help resolve these errors 😄

munnerz commented 6 years ago

Going to close this for now to keep things tidy. I don't think there is any issue with this test in particular aside from the fact it's one of the first tests to be executed in our suite. Feel free to re-open if this re-occurs or I am wrong!