telefonicaid / fiware-orion

Context Broker and CEF building block for context data management, providing NGSI interfaces.
https://github.com/telefonicaid/fiware-orion/blob/master/doc/manuals/orion-api.md
GNU Affero General Public License v3.0
210 stars 265 forks source link

Alarm for notification queue overpassing a given threshold #4113

Open fgalan opened 2 years ago

fgalan commented 2 years ago

Is your feature request related to a problem / use case? Please describe.

Orion is able to log in the case of notification queue is full (in threadpool notification model), either for the general queue:

Runtime Error (default notification queue is full)

or for per-service queues, if that functionality is in use:

Runtime Error (serv1 notification queue is full)

Thus, operation teams know about the queue is getting saturated when is already to late and notifications are being dropped.

Describe the solution you'd like

Implement a new alarm, this way:

Alarm ID Severity Detection strategy Stop condition Description Action
8 WARNING The following WARN text appears in the 'msg' field: "Raising alarm NotificaitonQueue <service>": <detail>". The following WARN text appears in the 'msg' field: "Releasing alarm NotificaitonQueue <service>", where <service> is the same one that triggered the alarm. Orion prints this trace when notification queue goes back below the threshold. The notification queue associated to the service (or <service> "default" for default queue) has overpassed the alarm threshold. The <detail> text described the particular threshold. No specific action has to be performed at Orion Context Broker service, but the update flow causing the notification on that service (or default queue) should be lowered in order to reduce pressure on queue. Another possible problem is due to malfunctioning notification receivers, if they are slow processing notifications and responding to Orion.

Things to decide:

Describe alternatives you've considered

None so far

Describe why you need this feature

It would be useful for the operation teams using Orion, so they can define alarms based in Orion logs.

Anjali-NEC commented 1 year ago

Hi @fgalan sir,

I would like to work on this issue, As per my understanding we need to add an alarm "NotificaitonQueue" when it overpassed the threshold.

How many thresholds? For instance >80% is critical, 50-80% is moderate. However, the current raise-release mechanism gets >complicated if more than one level is defined, so probably the simpler approach is just one threshold.

We need to specify only one threshold value for that we can hardwired the threshold value in Orion code or we can add CLI for that.

Please confirm my understanding.

fgalan commented 1 year ago

I think your understanding is correct. Thanks!