Open haerter-tss opened 1 month ago
This is imo a perfect use case for a watchdog implementation
A watchdog generally refers to a mechanism that monitors the health and performance of components, services or systems (like cpu) and takes action if certain thresholds are exceeded
isSystemHealthy
or isJobProcessable
)X
seconds "System cpu is still overloaded". X
should be a meaningful delay. Ideally @haerter-tss or @sven-dmlr decide the value.false
and print a debug log like "Job processing is skipped"This will also further decouple the actual job processing logic from the cpu monitoring, making everything more modular and better testable.
Is there any mechanism to handle a system that is under load for a very long time? (e.g. infinite loops, busy waiting, etc..)
Situation
The current monitoring alert logging is too verbose. Example: If the system logs "CPU OVERLOAD" these messages will be logged every second for a minute which results in an overblown and difficult to read log.
Wanted
Implement a mechanism to ensure that logs are generated with a time delay, reducing the frequency of log entries. Ensure that the first log entry is always generated immediately, but subsequent entries should be spaced out over time.
Solution
This change aims to make the logs more manageable and less overwhelming, while still providing necessary information.
Action Items: