keygen-sh / keygen-api

Keygen is a fair source software licensing and distribution API built with Ruby on Rails. For developers, by developers.
https://keygen.sh
Other
824 stars 53 forks source link

Add periodic job to clean up orphaned dead machines and processes #739

Closed ezekg closed 1 year ago

ezekg commented 1 year ago

When a policy's heartbeat duration is changed while there are heartbeat monitors in-progress, some machine (and process) heartbeat monitors may become out-of-date and the dead will eventually be orphaned if no further pings are sent.

For example:

Given a policy with a heartbeat duration of 600, if a machine is activated and a heartbeat monitor is started, and if the policy is then changed to have a heartbeat duration of 86000, when the machine's monitor runs at 600 seconds from time of activation, it will now see that the machine is not dead and will send a pong event and exit. Eventually, this machine dies if no further pings are sent, and since no pings were sent, there is then no heartbeat monitor to handle its death, and it remains unculled.

The same could happen for processes. I'm sure there are other scenarios like this. So we really need to set up a periodic job that deactivates orphaned dead machines i.e. those that no longer have a heartbeat monitor attached.