telefonicaid / fiware-orion

Context Broker and CEF building block for context data management, providing NGSI interfaces.
https://fiware-orion.rtfd.io/
GNU Affero General Public License v3.0
212 stars 265 forks source link

BadInput traces shown when alarm is already raised ("flapping" effect) #3860

Open fgalan opened 3 years ago

fgalan commented 3 years ago

Running test case

./testHarness.sh cases/3533_null_in_datetime_and_geo/null_in_geo_point.test

if we look to Context Broker traces looking for WARN level:

$ cat /tmp/contextBroker.log | grep WARN
time=2021-05-24T11:53:33.360Z | lvl=WARN | corr=aa4a7a4c-bc86-11eb-a2f4-000c29df7912 | trans=1621857212-299-00000000009 | from=0.0.0.0 | srv=<none> | subsrv=<none> | comp=Orion | op=AlarmManager.cpp[405]:badInput | msg=Raising alarm BadInput 0.0.0.0: error parsing location attribute for existing attribute: geo coordinates format error [see Orion user manual]: foo

time=2021-05-24T11:53:33.397Z | lvl=WARN | corr=aa504558-bc86-11eb-aff6-000c29df7912 | trans=1621857212-299-00000000010 | from=0.0.0.0 | srv=<none> | subsrv=<none> | comp=Orion | op=AlarmManager.cpp[432]:badInputReset | msg=Releasing alarm BadInput 0.0.0.0
time=2021-05-24T11:53:33.398Z | lvl=WARN | corr=aa504558-bc86-11eb-aff6-000c29df7912 | trans=1621857212-299-00000000010 | from=0.0.0.0 | srv=<none> | subsrv=<none> | comp=Orion | op=AlarmManager.cpp[405]:badInput | msg=Raising alarm BadInput 0.0.0.0: error parsing location attribute: geo coordinates format error [see Orion user manual]: bar

The first one (trans=...0009) is OK. An alarm is raised due to:

payload='{
  "loc": {
    "value": "foo",
    "type": "geo:point"
  }
}'
orionCurl --url /v2/entities/E1/attrs --payload "$payload"

However, second one (trans=...0010) is NOK. It is caused by

payload='{
  "loc": {
    "value": "bar",
    "type": "geo:point"
  }
}'
orionCurl --url /v2/entities/E2/attrs --payload "$payload"

However, given that the BadInput alarm for 0.0.0.0 is already raised we shouldn't see any new alarm-related trace. Instead of that we are getting a buggy release+raise "flapping" effect.

fgalan commented 3 years ago

Looking this in detail:


The alarm is raised at this point:

imagen

it is relevant level 8 in the call stack:

imagen

The alarm is released at this point:

imagen

it is relevant level 2 in the call stack:

imagen


In restService() (restService.cpp) we have:

    //
    // If we have gotten this far the Input is OK.
    // Except for all the badVerb/badRequest, in the restBadVerbV vector.
    //
    // So, the 'Bad Input' alarm is cleared for this client.
    //
    if (serviceV != restBadVerbV)
    {
      alarmMgr.badInputReset(clientIp);
    }

    std::string response = serviceV[ix].treat(ciP, components, compV, &parseData);

So the problem is that alarmMgr.badInputReset(clientIp); is assumed too early (only checking that we are not in a badVerb situation) but the serviceV[ix].treat(ciP, components, compV, &parseData) could raise alarms...

A possible solution could be to have a return parameter with the internal alarm situation, eg:

    std::string response = serviceV[ix].treat(ciP, components, compV, &parseData, raisedAlarm);
    if ((serviceV != restBadVerbV) && (!raisedAlarm))
    {
      alarmMgr.badInputReset(clientIp);
    }

but passing-down that return parameters maybe is too complicated. A lot of signature functions (all service routine functions!) would need to change. It is not seem to be feasible.

fgalan commented 3 years ago

Some other interesting facts:

fgalan commented 3 years ago

but passing-down that return parameters maybe is too complicated. A lot of signature functions (all service routine functions!) would need to change. It is not seem to be feasible.

As alternative to signature modification, the ciP object (which is global) could be used to implement that "return parameter"

fgalan commented 2 years ago

PR https://github.com/telefonicaid/fiware-orion/pull/3908 (although probably we are not going to uses it at the end)