This PR expands the state tracking functionality of the application by adding a State class that is accessible via a multiprocessing.SyncManager utility function named get_manager() in the new state module.
The state class implements ok like the existing state dict, but adds new error-related attributes that can be used to gather more granular metrics and enable easier diagnostics:
The set_error() method processes the error and sets the appropriate state attributes based on the contents of the error. Each error increments a new error counter (error_count).
The set_ok() method sets the current state to OK and clears any error messages from the state, while keeping the error count intact. This maintains the total number of errors per process in the health file even when the process has returned to an OK state.
Expanded health file process entries
Each process in the health file has received new keys corresponding to the fields of the new State class.
This PR enables more granular health information for each process in the health file. The choice to move away from a dict to a concrete class was made so that we could add the utility methods set_ok() and set_error(). Adding such functionality to a dict would have required helper functions that manipulated dict keys and performed numerous dict.setdefault() and isinstance() calls to even come close to guaranteeing any sort safety with regards to types and arbitrary key access. Thus, it was just simpler to rewrite the state as a Pydantic-backed dataclass.
With this new class, we can simply call State.asdict() to get a dict representation of the State instance with all attributes guaranteed to be present as dict keys when we dump the state to the health file. This makes parsing the file easier, as we don't have to check for the existence of keys before we access them.
Testing
Since the State class is synced between processes via the manager, we are able to more accurately test subprocesses due to increased introspection of each process' state. Information such as number of errors and exception types are very difficult to test with the existing state tracking, but with the help of the new state attributes in this PR, we can easily access that information.
This PR expands the state tracking functionality of the application by adding a
State
class that is accessible via amultiprocessing.SyncManager
utility function namedget_manager()
in the newstate
module.The state class implements
ok
like the existing state dict, but adds new error-related attributes that can be used to gather more granular metrics and enable easier diagnostics:https://github.com/unioslo/zabbix-auto-config/blob/4e5987f80d529bdd405c686dc1dd12fae784aae4/zabbix_auto_config/state.py#L9-L43
The
set_error()
method processes the error and sets the appropriate state attributes based on the contents of the error. Each error increments a new error counter (error_count
).The
set_ok()
method sets the current state to OK and clears any error messages from the state, while keeping the error count intact. This maintains the total number of errors per process in the health file even when the process has returned to an OK state.Expanded health file process entries
Each process in the health file has received new keys corresponding to the fields of the new
State
class.A healthy process looks like the following:
While an unhealthy process looks like this:
Motivation
This PR enables more granular health information for each process in the health file. The choice to move away from a dict to a concrete class was made so that we could add the utility methods
set_ok()
andset_error()
. Adding such functionality to a dict would have required helper functions that manipulated dict keys and performed numerousdict.setdefault()
andisinstance()
calls to even come close to guaranteeing any sort safety with regards to types and arbitrary key access. Thus, it was just simpler to rewrite the state as a Pydantic-backed dataclass.With this new class, we can simply call
State.asdict()
to get a dict representation of the State instance with all attributes guaranteed to be present as dict keys when we dump the state to the health file. This makes parsing the file easier, as we don't have to check for the existence of keys before we access them.Testing
Since the
State
class is synced between processes via the manager, we are able to more accurately test subprocesses due to increased introspection of each process' state. Information such as number of errors and exception types are very difficult to test with the existing state tracking, but with the help of the new state attributes in this PR, we can easily access that information.