skupperproject / skupper-router

An application-layer router for Skupper networks
https://skupper.io
Apache License 2.0
14 stars 18 forks source link

Debug feature: check for long running timer callbacks #1462

Open kgiusti opened 6 months ago

kgiusti commented 6 months ago

The qd_timer subsystem is the global timer callback module for the router. All router timer events depend on it. Timer callbacks are serialized in order to avoid race conditions between timed events. If a timer callback runs for too long (e.g. blocks) this will delay execution of all succeeding timer events. This will cause timers to miss deadlines which will result in undefined behavior and possible destabilization of the router network.

We should instrument the timer code to track the amount of time spent in a timer callback and assert if a callback takes "too much" time (value TBD). This code should be enabled only for Debug router builds due to the high runtime cost of monitoring.

The impetus for this feature comes from issue #1451