thin-edge / thin-edge.io

The open edge framework for lightweight IoT devices
https://thin-edge.io
Apache License 2.0
219 stars 54 forks source link

tedge-watchdog: does not handle systemd service file overrides #2354

Open Bravo555 opened 1 year ago

Bravo555 commented 1 year ago

Describe the bug

In systemd, instead of editing service files located in /lib/systemd/system/, it is advisable to use systemctl edit --full SERCVICE_NAME.service which creates a new unit file in /etc/systemd/system/SERVICE_NAME.service which includes modifications the user performed, or systemctl edit SERVICE_NAME.service which creates a directory /etc/systemd/system/SERVICE_NAME.service.d in which a file override.conf containing user's modifications, will be saved.

For that reason, it isn't enough to parse /lib/systemd/system/SERVICE_NAME.service file for WatchdogSec attribute, because if the user used systemctl edit command, it won't affect the original file, but it will be picked up by systemd, resulting in an undesirable behaviour where the service will be repeatedly killed by systemd, while the watchdog is not able to detect that the service has the WatchdogSec attribute.

To Reproduce

  1. Connect to c8y cloud
  2. Stop tedge-mapper-c8y and tedge-watchdog services
  3. Use systemctl edit --full tedge-mapper-c8y and add WatchdogSec=30 in [Service] section, as described in the documentation
  4. Start tedge-mapper-c8y and tedge-watchdog services
  5. systemctl status tedge-watchdog should contain log line: WARN tedge_watchdog::systemd_watchdog: Watchdog is not enabled for device/main/service/tedge-mapper-c8y
  6. journalctl -u tedge-mapper-c8y -f should show the service being killed with signal SIGABRT due to not notifying systemd in time.

Expected behavior

tedge-watchdog should pick up WatchdogSec attribute as present, even if it was added using systemctl edit command.

Additional context

Instead of parsing a service file from a hardcoded location, it would be better to use systemd's D-Bus interface or a crate if such exists.

Lastly, this is not a bug insofar as our code doesn't work despite the user precisely following the instructions found in the documentation - the user has to do things differently from how they're described in the documentation, but seeing how "using systemctl edit command to edit systemd unit files" is a recommended good practice, and how following this practice leads to the unit being inoperable due to constant timeouts, I've decided to classify it as such.

reubenmiller commented 8 months ago

Yes the current implementation of editing the /lib/systemd/system/SERVICE_NAME.service is very problematic as this file will be overwritten when thin-edge.io is updated, thus silently disabling the watchdog functionality (exactly what you don't want in a watchdog!).

Below might be a way forward which supports reading the current WatchdogUSec via the systemd interface (rather than parsing files manually):

  1. Edit the systemd definition of a service that should have a watchdog enable on it

    For example, edit the tedge-agent service

    sudo systemctl edit tedge-agent.service

    Add the following contents (on newer systemd installations you can use a value of 30s)

    [Service]
    WatchdogSec=30
  2. Edit the tedge-watchdog service definition

    sudo systemctl edit tedge-watchdog.service

    Add the service dependencies to ensure the watchdog service starts before the other services.

    [Unit]
    Before=tedge-agent.service tedge-mapper-c8y.service

    Note

    Using the Before property on the tedge-watchdog service is easier/cleaner to managed as you only have to edit the service dependencies in one place.

  3. Reload the systemd daemon

    sudo systemctl daemon-reload

Now the watchdog service can check which services have the Watchdog enabled by running (though using the DBUS interface might be better?):

systemctl show tedge-agent -p WatchdogUSec

Output

WatchdogUSec=30