thin-edge / thin-edge.io

The open edge framework for lightweight IoT devices
https://thin-edge.io
Apache License 2.0
221 stars 54 forks source link

tedge-watchdog: Using low values for `WatchdogSec` leads to tedge-watchdog sending health check messages too fast #2355

Closed Bravo555 closed 5 months ago

Bravo555 commented 1 year ago

Describe the bug

When trying to use low values for WatchdogSec attribute, like 2 or 3, tedge-watchdog sends messages with an interval of ~0.001s:

Oct 19 15:35:26 e17ab0835ebd tedge-watchdog[318793]: 2023-10-19T15:35:26.39360965Z  WARN tedge_watchdog::systemd_watchdog: No health check response received from device/main/service/tedge-mapper-c8y in time
Oct 19 15:35:26 e17ab0835ebd tedge-watchdog[318793]: 2023-10-19T15:35:26.39486311Z  WARN tedge_watchdog::systemd_watchdog: No health check response received from device/main/service/tedge-mapper-c8y in time
Oct 19 15:35:26 e17ab0835ebd tedge-watchdog[318793]: 2023-10-19T15:35:26.39611997Z  WARN tedge_watchdog::systemd_watchdog: No health check response received from device/main/service/tedge-mapper-c8y in time
Oct 19 15:35:26 e17ab0835ebd tedge-watchdog[318793]: 2023-10-19T15:35:26.39736915Z  WARN tedge_watchdog::systemd_watchdog: No health check response received from device/main/service/tedge-mapper-c8y in time
Oct 19 15:35:26 e17ab0835ebd tedge-watchdog[318793]: 2023-10-19T15:35:26.39863253Z  WARN tedge_watchdog::systemd_watchdog: No health check response received from device/main/service/tedge-mapper-c8y in time
Oct 19 15:35:26 e17ab0835ebd tedge-watchdog[318793]: 2023-10-19T15:35:26.399971749Z  WARN tedge_watchdog::systemd_watchdog: No health check response received from device/main/service/tedge-mapper-c8y in time

To Reproduce

  1. Connect to c8y
  2. Stop tedge-watchdog and tedge-mapper-c8y services
  3. Add WatchdogSec=3 to /lib/systemd/system/tedge-mapper-c8y.service under [Service] section
  4. Start tedge-watchdog
  5. journalctl -u tedge-watchdog -f should show it sending messages very fast, like in the snippet in the section above

Expected behavior

tedge-watchdog should either forbid a user from setting too low interval, or follow the interval correctly. In the first place, tedge-watchdog using a smaller interval than systemd is unexpected and should be better documented.

Additional context

https://github.com/thin-edge/thin-edge.io/blob/f811fad6e406e6b876059073b2da50fa689dd3c8/crates/core/tedge_watchdog/src/systemd_watchdog.rs#L60

The bug is caused by this line. When interval is <4, the result is that 0 is used as a delay between sending messages. Should be an easy fix.

One could argue this low of a timeout makes no sense, and I agree, but I wanted to set it to 2s or 3s only for a test. Still, this behaviour is unexpected and it'd be better for it to be fixed.

gligorisaev commented 6 months ago

QA has thoroughly checked the bug and here are the results: