sfudeus / homematic_exporter

Prometheus exporter for homematic ccu3
Apache License 2.0
23 stars 17 forks source link

Add a diagnostic `refresh_age` metric and corresponding container healthcheck #50

Closed michaelosthege closed 1 month ago

michaelosthege commented 2 months ago

Hi @sfudeus, thanks for this great exporter!

Every now and then my container appears to hang, producing such gaps until I restart it:

image

I assume that some internal loop crashes, but the logs don't indicate anything.

To notice when things to out of order, I introduced a new metric:

# HELP homematic_refresh_age Seconds since the last successful refresh.
# TYPE homematic_refresh_age gauge
homematic_refresh_age{ccu="192.168.178.30"} 4.156044244766235

In combination with a healthcheck script and docker-autoheal this makes it easy to automatically restart the container.

image

sfudeus commented 1 month ago

Hi @michaelosthege, thanks for the contribution. This looks good besides the one comment I had. Ultimately, instead of needing such a healthcheck long-term, I'd prefer finding and preventing an issue which would block the loop infinitely. But at least this helps finding such occasions.

michaelosthege commented 1 month ago

I'd prefer finding and preventing an issue which would block the loop infinitely. But at least this helps finding such occasions.

Totally agree. I tried to debug, but found it quite hard. Mainly two things prevented me from proper debugging:

As a start I would recommend to apply ruff and refactor some things into functions. Let me know if I shall open a PR!

sfudeus commented 1 month ago

Mainly two things prevented me from proper debugging:

  • The DEBUG mode is too verbose to be useful for debugging rare events
  • The generate_metrics is quite complex, with lots of branches and a lots of indentation.

As a start I would recommend to apply ruff and refactor some things into functions. Let me know if I shall open a PR!

Feel free of you find the time for it. The exporter has grown over time and could need some rework. Please try to break the refactoring down into reasonable chunks to not have one complex big refactoring which is hard to review.

sfudeus commented 1 month ago

@michaelosthege Can you rebase on the latest state of master? I recently refactored the workflow definitions and the image build job fails because of that mismatch.

michaelosthege commented 1 month ago

Done!

sfudeus commented 1 month ago

Workflow security setting prevent running from a remote repo, so I cloned it into a local branch on PR #58. Image preview available at docker.io/sfudeus/homematic_exporter:preview-58