openshift / origin-metrics

78 stars 117 forks source link

Make hawkular metrics liveness script more robust #391

Closed joelsmith closed 3 years ago

joelsmith commented 6 years ago
joelsmith commented 6 years ago

We saw the liveness probe failing on the starter-ca-central-1 cluster.

Liveness probe for "hawkular-metrics-sf0c9_openshift-infra(8110e54a-b4d1-11e7-982f-02ac
3a1f9d61):hawkular-metrics" failed (failure): Failed to access the status endpoint : <urlopen error [Errno 111] Connection refused>.
Traceback (most recent call last):
File "/opt/hawkular/scripts/hawkular-metrics-liveness.py", line 48, in <module>
if int(uptime) < int(timeout):
ValueError: invalid literal for int() with base 10: ''

@mwringe or @jsanda PTAL cc @eparis

coolpalani commented 6 years ago

File "/opt/hawkular/scripts/hawkular-metrics-liveness.py", line 48 How do we fix this issue? I am currently facing this problem in one of the cluster

mwringe commented 6 years ago

@jsanda Can you take a look?

openshift-merge-robot commented 5 years ago

/retest

openshift-merge-robot commented 5 years ago

/retest

openshift-bot commented 4 years ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot commented 4 years ago

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale

openshift-bot commented 3 years ago

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen. Mark the issue as fresh by commenting /remove-lifecycle rotten. Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci-robot commented 3 years ago

@openshift-bot: Closed this PR.

In response to [this](https://github.com/openshift/origin-metrics/pull/391#issuecomment-720048252): >Rotten issues close after 30d of inactivity. > >Reopen the issue by commenting `/reopen`. >Mark the issue as fresh by commenting `/remove-lifecycle rotten`. >Exclude this issue from closing again by commenting `/lifecycle frozen`. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.