telefonicaid / fiware-orion

Context Broker and CEF building block for context data management, providing NGSI interfaces.
https://github.com/telefonicaid/fiware-orion/blob/master/doc/manuals/orion-api.md
GNU Affero General Public License v3.0
210 stars 265 forks source link

[Question] Isolating subsciptions - subscription in error #3539

Open tomsluyts opened 5 years ago

tomsluyts commented 5 years ago

We've been having an issue with one faulty subscription (notification endpoint down) impacting the other active subscriptions, because of retries flooding the processor. Is there a way to set up Orion so subscriptions are more isolated?

I think it might also make sense to have Orion have the option of no longer trying to deliver on certain subscriptions anymore once a certain number of tries have failed.

fgalan commented 5 years ago

We've been having an issue with one faulty subscription (notification endpoint down) impacting the other active subscriptions, because of retries flooding the processor. Is there a way to set up Orion so subscriptions are more isolated?

How is the impact in other subscriptions? Maybe accumulated connection attempts while the connection failing endpoint timeout expires?

I think it might also make sense to have Orion have the option of no longer trying to deliver on certain subscriptions anymore once a certain number of tries have failed.

It would involve introducing a piece of state in the CB (the per-subscription fails counter) but it's a valuable idea anyway. I have created an issue for it here: https://github.com/telefonicaid/fiware-orion/issues/3541. Please feel free of adding feedback as comment in that issue.

mathi123 commented 5 years ago

It was indeed an accumulation of connection attempts that timeout, leave no more room for the other notifications.

I believe it is documented here: https://fiware-orion.readthedocs.io/en/master/admin/perf_tuning/index.html#outgoing-http-connection-timeout

Can you confirm that there is a possibility that other subscriptions can be affected?

fgalan commented 5 years ago

Yes, it may impact globally to the Context Broker. I think the explanation provided in the documentation section you cite is quite precise.

Have you tried using -httpTimeout to a short value (e.g. 1.5*N, being N the maximum time your system takes to establish connection when they are working)?