netenglabs / suzieq

Using network observability to operate and design healthier networks
https://www.stardustsystems.net/
Apache License 2.0
787 stars 104 forks source link

Poller: handle multiple exceptions #851

Open LucaNicosia opened 1 year ago

LucaNicosia commented 1 year ago

Description

The entire code of the controller and the workers leverage on asyncio tasks.

In the current version of the code, if one of the task was raising an exception, we fail right away and re-raise the exception letting the core method of the SuzieQ components handle the exception. It may happen that more than one task raise an exception: in this case, since we raise an exception as soon as we see one, the other exceptions are not handled. For these exceptions, asyncio complains with the message Task exception was never retrieved (see python docs) and re-raise the exceptions, but since the main code of the SuzieQ component is already ended, the exception is not handled and the traceback is printed on the log generating a confusing output.

To fix this issue, it is introduced a new exception class called SqRuntimeError which contains a list of exceptions. In the code of the controller and the worker, instead of raising the exception as soon as we see a task failing, we collect all the exception in an array. If this array is not empty, we use this array to initialize a SqRuntimeError exception and we raise this exception. The main code of the controller (sq_poller.py) and the worker (sq_worker.py) will verify if the type of the exception is SqRuntimeError and if so, log each exception in the exception list.

Type of change

Double Check

ryanmerolle commented 2 months ago

Can this be merged?

ddutt commented 2 months ago

Can this be merged?

Yes, we'll work on getting this done for a possible end-of-month release