Closed nhudson closed 2 years ago
After a ton of debugging it looks like kind, sometimes doesn't allow scrapes to the following pods once everything is installed
kube-proxy
kube-controller-manager
kube-scheduler
etcd
This causes the test to fail due to the up
status being 0
for these 4 pods. Checking pods these look to be healthy and as far as I can tell the cluster is running with no issues.
Possible short term solution might be to just disable the rules that would scrape these endpoints
kube-prometheus-stack:
defaultRules:
rules:
etcd: false
kubeControllerManager: false
kubeProxy: false
kubeScheduler: false
kubeControllerManager:
enabled: false
kubeProxy:
enabled: false
kubeScheduler:
enabled: false
kubeEtcd:
enabled: false
Also since it takes some time for Promscale to come online, the up
metric does not get updated by the time the test runs. This will cause the test to fail as well.
~Let's disable scrape configs and rules for endpoints listed in https://github.com/timescale/tobs/pull/564#issuecomment-1231725609.~
~As for promscale, I was under impression that helm test are run after release is deployed. In our case we are waiting until promscale is in Ready state. This means the up
metric should have a value of 1
in next scrape after promscale is Ready. For now let's put just a sleep 30
before running the query. In the future I think we should move all our tests into some golang application and run those queries with a backoff mechanism to reduce test flakiness.~
nvm, I see you already did all of that :smile:
What this PR does / why we need it
Helm test to check if components of the stack are actually up and running
Which issue this PR fixes
(optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close that issue when PR gets merged)Checklist