kpn / arq-prometheus

Prometheus metrics for arq job queues
Apache License 2.0
9 stars 1 forks source link

Seemingly spurious health check logs #6

Open blakehawkins opened 1 year ago

blakehawkins commented 1 year ago

Arq's default health check interval is 3600 seconds (https://github.com/samuelcolvin/arq/blob/9109c2e59d2b13fa59d246da03d19d7844a6fa19/arq/worker.py#LL208C30-L208C30)

This causes spurious warn logs from time to time for me, which say:

'WARNING::arq.prometheus:159    [arq_prometheus] Health key could not be read, value is `None`.
Possible causes:
- health key has not been initialized by the worker yet
- `health_check_key` or `queue_name` settings may be wrong

This might be caused by an issue with redis persistence in my cluster, but in any case this led me to realise that arq-prometheus relies on this health check frequency for producing up-to-date metrics

Recommend adding health_check_interval to the Readme for arq worker settings.

Here is the patch that I used to resolve the warning spam:

            # Run arq health checks at double the frequency of our arq-prometheus monitoring:
            health_check_interval=datetime.timedelta(seconds=7.5),
woile commented 1 year ago

Thanks for reporting this issue, what would you add to the README? A recommendation to review the health_check_interval on arq?

blakehawkins commented 1 year ago

Thanks for looking @woile . At the moment you have an example of class WorkerSettings with a few parameters specified. I think I'd just add health_check_interval there and a comment explaining why you set the value