Open viniarck opened 1 year ago
I agree, @viniarck. This feature can be part of a watchdog Napp or something like this, which consolidates all validations (not only of_core) and translates into an operational status (which could indicate success, failure, or partial failure - includingg failure in non-critical components, so on)
Problem:
Network operators who are deploying Kytos-ng in production and using
of_core
need to be able to identify (and hook it on external healthcheck mechanisms) when OpenFlow connections aren't getting stable either because of packets/handshake or a generalized crashes. Our python runtime shouldn't not struggle handling connections as long as it's a reasonable value, if it is, thenof_core
should expose that this is happening (maybe through and endpoint) just so this can be used externally to spun up and switchover to a differentkytosd
instance, this can help for recoverable errors.Other than that, outside of code related implementation, network operators should also have alerts for how many errors or tracebacks have happened overtime, we can have this readily available on ES with Kibana, although alerts are premium ES feature, but the data is there, so a script could also poll or query that:
cc'ing @italovalcy for his info
This issue still needs further discution, but overall that's the problem we need to solve.