We have error states, they are already events in kafka and already stored in a database. We should be able to easily create events in prometheus related to the number of new errors in the last x amount of time (probably an http endpoint for prom to scrape, which queries the states table in the database).
Once we have it in prometheus, we can add alerts in alertmanager, etc.
We have error states, they are already events in kafka and already stored in a database. We should be able to easily create events in prometheus related to the number of new errors in the last x amount of time (probably an http endpoint for prom to scrape, which queries the states table in the database).
Once we have it in prometheus, we can add alerts in alertmanager, etc.