telefonicaid / fiware-orion

Context Broker and CEF building block for context data management, providing NGSI interfaces.
https://github.com/telefonicaid/fiware-orion/blob/master/doc/manuals/orion-api.md
GNU Affero General Public License v3.0
212 stars 265 forks source link

Orion notifications dropping metadata after a while (only when csub cache is in use) #2950

Open taliaga opened 7 years ago

taliaga commented 7 years ago

Hi guys,

I'm experiencing a strange behaviour in Orion notifications that is becoming a blocking issue for us.

Specifically, I've created a subscription to receive notifications with the dateCreated and dateModified metadata. It works fine for a couple of seconds, but after a handful of notifications (from 4 to 13) notifications drop the metadata (the modification time I need) and keep on coming only with the new values.

I haven't found the culprit yet, but I've realized that between the last complete notification and the first incomplete notification, Orion is doing something in the middle (looks like a sync with subscription object, but I can't tell for sure) as you can see below.

Orion version:

{
"orion" : {
  "version" : "1.7.0",
  "uptime" : "0 d, 0 h, 20 m, 19 s",
  "git_hash" : "e544780eb64a4a2557c1f51dde070b8d82b86c49",
  "compile_time" : "Wed Feb 8 13:03:53 UTC 2017",
  "compiled_by" : "root",
  "compiled_in" : "b99744612d0b"
}
}

Orion debug output: Last complete notification (i.e, including the metadata):

time=Sunday 02 Jul 08:29:27 2017.092Z | lvl=INFO | corr=8f8a1ae2-5f00-11e7-a145-0242ac130004 | trans=1498983926-764-00000000028 | from=pending | srv=pending | subsrv=pending | comp=Orion | op=logMsg.h[1832]:lmTransactionStart | msg=Starting transaction from 127.0.0.1:56962/version
time=Sunday 02 Jul 08:29:27 2017.094Z | lvl=INFO | corr=8f8a1ae2-5f00-11e7-a145-0242ac130004 | trans=1498983926-764-00000000028 | from=127.0.0.1 | srv=<default> | subsrv=<default> | comp=Orion | op=logMsg.h[1916]:lmTransactionEnd | msg=Transaction ended
time=Sunday 02 Jul 08:29:27 2017.256Z | lvl=INFO | corr=8fa30b92-5f00-11e7-8b64-0242ac130004 | trans=1498983926-764-00000000029 | from=pending | srv=pending | subsrv=pending | comp=Orion | op=logMsg.h[1832]:lmTransactionStart | msg=Starting transaction from 172.19.0.1:38300/v2/entities/Room1/attrs
time=Sunday 02 Jul 08:29:27 2017.256Z | lvl=INFO | corr=8fa30b92-5f00-11e7-8b64-0242ac130004 | trans=1498983926-764-00000000029 | from=172.19.0.1 | srv=pending | subsrv=/ | comp=Orion | op=rest.cpp[872]:servicePathSplit | msg=Service Path 0: '/'
time=Sunday 02 Jul 08:29:27 2017.256Z | lvl=INFO | corr=8fa30b92-5f00-11e7-8b64-0242ac130004 | trans=1498983926-764-00000000029 | from=172.19.0.1 | srv=default | subsrv=/ | comp=Orion | op=connectionOperations.cpp[210]:collectionCount | msg=Database Operation Successful (count: { _id.id: "Room1", _id.servicePath: { $in: [ null, /^/$/ ] } })
time=Sunday 02 Jul 08:29:27 2017.257Z | lvl=INFO | corr=8fa30b92-5f00-11e7-8b64-0242ac130004 | trans=1498983926-764-00000000029 | from=172.19.0.1 | srv=default | subsrv=/ | comp=Orion | op=connectionOperations.cpp[79]:collectionQuery | msg=Database Operation Successful (query: { _id.id: "Room1", _id.servicePath: { $in: [ null, /^/$/ ] } })
time=Sunday 02 Jul 08:29:27 2017.257Z | lvl=INFO | corr=8fa30b92-5f00-11e7-8b64-0242ac130004 | trans=1498983926-764-00000000029 | from=172.19.0.1 | srv=default | subsrv=/ | comp=Orion | op=connectionOperations.cpp[401]:collectionUpdate | msg=Database Operation Successful (update: <{ _id.id: "Room1", _id.type: "Room", _id.servicePath: { $in: [ null, /^/$/ ] } }, { $set: { attrs.temperature: { value: 11.83139901120355, type: "Number", mdNames: [], creDate: 1498984111, modDate: 1498984167 }, modDate: 1498984167 }, $unset: { location: 1 } }>)
time=Sunday 02 Jul 08:29:27 2017.258Z | lvl=INFO | corr=N/A | trans=1498983926-764-00000000030 | from=pending | srv=pending | subsrv=pending | comp=Orion | op=logMsg.h[1832]:lmTransactionStart | msg=Starting transaction to http://192.0.0.1:8668/notify
time=Sunday 02 Jul 08:29:27 2017.258Z | lvl=INFO | corr=N/A | trans=1498983926-764-00000000030 | from=pending | srv=pending | subsrv=pending | comp=Orion | op=httpRequestSend.cpp[577]:httpRequestSendWithCurl | msg=Sending message 12 to HTTP server: sending message of 706 bytes to HTTP server
time=Sunday 02 Jul 08:29:27 2017.258Z | lvl=INFO | corr=8fa30b92-5f00-11e7-8b64-0242ac130004 | trans=1498983926-764-00000000029 | from=172.19.0.1 | srv=default | subsrv=/ | comp=Orion | op=logMsg.h[1916]:lmTransactionEnd | msg=Transaction ended
time=Sunday 02 Jul 08:29:27 2017.358Z | lvl=INFO | corr=N/A | trans=1498983926-764-00000000030 | from=pending | srv=pending | subsrv=pending | comp=Orion | op=httpRequestSend.cpp[598]:httpRequestSendWithCurl | msg=Notification Successfully Sent to http://192.0.0.1:8668/notify
time=Sunday 02 Jul 08:29:27 2017.359Z | lvl=INFO | corr=N/A | trans=1498983926-764-00000000030 | from=pending | srv=pending | subsrv=pending | comp=Orion | op=logMsg.h[1916]:lmTransactionEnd | msg=Transaction ended

Orion debug output: Suspicious Operation in-between

time=Sunday 02 Jul 08:29:30 2017.103Z | lvl=INFO | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=connectionOperations.cpp[626]:runCollectionCommand | msg=Database Operation Successful (command: { listDatabases: 1 })
time=Sunday 02 Jul 08:29:30 2017.103Z | lvl=INFO | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=connectionOperations.cpp[79]:collectionQuery | msg=Database Operation Successful (query: {})
time=Sunday 02 Jul 08:29:30 2017.104Z | lvl=INFO | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=connectionOperations.cpp[79]:collectionQuery | msg=Database Operation Successful (query: {})
time=Sunday 02 Jul 08:29:30 2017.104Z | lvl=INFO | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=connectionOperations.cpp[401]:collectionUpdate | msg=Database Operation Successful (update: <{ _id: ObjectId('5958aeaf85cd026ace5af1f2') }, { $inc: { count: 12 } }>)
time=Sunday 02 Jul 08:29:30 2017.105Z | lvl=INFO | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=connectionOperations.cpp[401]:collectionUpdate | msg=Database Operation Successful (update: <{ _id: ObjectId('5958aeaf85cd026ace5af1f2'), $or: [ { lastNotification: { $lt: 1498984167 } }, { lastNotification: { $exists: false } } ] }, { $set: { lastNotification: 1498984167 } }>)
time=Sunday 02 Jul 08:29:30 2017.115Z | lvl=INFO | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=connectionOperations.cpp[401]:collectionUpdate | msg=Database Operation Successful (update: <{ _id: ObjectId('5958aeaf85cd026ace5af1f2'), $or: [ { lastSuccess: { $lt: 1498984167 } }, { lastSuccess: { $exists: false } } ] }, { $set: { lastSuccess: 1498984167 } }>)

Orion debug output: First incomplete notification missing the metadata:

time=Sunday 02 Jul 08:29:32 2017.266Z | lvl=INFO | corr=929f9388-5f00-11e7-a911-0242ac130004 | trans=1498983926-764-00000000031 | from=pending | srv=pending | subsrv=pending | comp=Orion | op=logMsg.h[1832]:lmTransactionStart | msg=Starting transaction from 172.19.0.1:38306/v2/entities/Room1/attrs
time=Sunday 02 Jul 08:29:32 2017.266Z | lvl=INFO | corr=929f9388-5f00-11e7-a911-0242ac130004 | trans=1498983926-764-00000000031 | from=172.19.0.1 | srv=pending | subsrv=/ | comp=Orion | op=rest.cpp[872]:servicePathSplit | msg=Service Path 0: '/'
time=Sunday 02 Jul 08:29:32 2017.267Z | lvl=INFO | corr=929f9388-5f00-11e7-a911-0242ac130004 | trans=1498983926-764-00000000031 | from=172.19.0.1 | srv=default | subsrv=/ | comp=Orion | op=connectionOperations.cpp[210]:collectionCount | msg=Database Operation Successful (count: { _id.id: "Room1", _id.servicePath: { $in: [ null, /^/$/ ] } })
time=Sunday 02 Jul 08:29:32 2017.268Z | lvl=INFO | corr=929f9388-5f00-11e7-a911-0242ac130004 | trans=1498983926-764-00000000031 | from=172.19.0.1 | srv=default | subsrv=/ | comp=Orion | op=connectionOperations.cpp[79]:collectionQuery | msg=Database Operation Successful (query: { _id.id: "Room1", _id.servicePath: { $in: [ null, /^/$/ ] } })
time=Sunday 02 Jul 08:29:32 2017.270Z | lvl=INFO | corr=929f9388-5f00-11e7-a911-0242ac130004 | trans=1498983926-764-00000000031 | from=172.19.0.1 | srv=default | subsrv=/ | comp=Orion | op=connectionOperations.cpp[401]:collectionUpdate | msg=Database Operation Successful (update: <{ _id.id: "Room1", _id.type: "Room", _id.servicePath: { $in: [ null, /^/$/ ] } }, { $set: { attrs.temperature: { value: 30.03241434744688, type: "Number", mdNames: [], creDate: 1498984111, modDate: 1498984172 }, modDate: 1498984172 }, $unset: { location: 1 } }>)
time=Sunday 02 Jul 08:29:32 2017.270Z | lvl=INFO | corr=N/A | trans=1498983926-764-00000000032 | from=pending | srv=pending | subsrv=pending | comp=Orion | op=logMsg.h[1832]:lmTransactionStart | msg=Starting transaction to http://192.0.0.1:8668/notify
time=Sunday 02 Jul 08:29:32 2017.271Z | lvl=INFO | corr=929f9388-5f00-11e7-a911-0242ac130004 | trans=1498983926-764-00000000031 | from=172.19.0.1 | srv=default | subsrv=/ | comp=Orion | op=logMsg.h[1916]:lmTransactionEnd | msg=Transaction ended
time=Sunday 02 Jul 08:29:32 2017.271Z | lvl=INFO | corr=N/A | trans=1498983926-764-00000000032 | from=pending | srv=pending | subsrv=pending | comp=Orion | op=httpRequestSend.cpp[577]:httpRequestSendWithCurl | msg=Sending message 13 to HTTP server: sending message of 433 bytes to HTTP server
time=Sunday 02 Jul 08:29:32 2017.275Z | lvl=INFO | corr=N/A | trans=1498983926-764-00000000032 | from=pending | srv=pending | subsrv=pending | comp=Orion | op=httpRequestSend.cpp[598]:httpRequestSendWithCurl | msg=Notification Successfully Sent to http://192.0.0.1:8668/notify
time=Sunday 02 Jul 08:29:32 2017.275Z | lvl=INFO | corr=N/A | trans=1498983926-764-00000000032 | from=pending | srv=pending | subsrv=pending | comp=Orion | op=logMsg.h[1916]:lmTransactionEnd | msg=Transaction ended

Output from the echo server receiving the notifications:

INFO:__main__:Received payload: {'id': 'Room1', 'type': 'Room', 'temperature': {'type': 'Number', 'value': 11.831399011, 'metadata': {'dateCreated': {'type': 'DateTime', 'value': '2017-07-02T08:28:31.00Z'}, 'dateModified': {'type': 'DateTime', 'value': '2017-07-02T08:29:27.00Z'}}}}
INFO:werkzeug:192.0.0.1 - - [02/Jul/2017 10:29:26] "POST /notify HTTP/1.1" 200 -
INFO:__main__:Received payload: {'id': 'Room1', 'type': 'Room', 'temperature': {'type': 'Number', 'value': 30.032414347, 'metadata': {}}}
INFO:__main__:Modified attribute "temperature" must come with dateModified metadata. Include "metadata": [ "dateModified" ] in your notification params.
INFO:werkzeug:192.0.0.1 - - [02/Jul/2017 10:29:31] "POST /notify HTTP/1.1" 400 -

Subscription used

{
    "description": "Test subscription",
    "subject": {
        "entities": [
          {
            "idPattern": ".*",
            "type": "Room"
          }
        ],
        "condition": {
          "attrs": [
            "temperature",
          ]
        }
      },
    "notification": {
        "http": {
          "url": notify_url
        },
        "attrs": [
            "temperature",
        ],
        "metadata": ["dateCreated", "dateModified"]
    },
    "throttling": 5
}

The subscription object looks fine with "metadata" throughout the experiment:

$ curl 0.0.0.0:1026/v2/subscriptions -H 'Fiware-Service: default' -H 'Fiware-ServicePath: /'
[{"id":"5958c5230e110bc98ee0440a","description":"Test subscription","status":"active","subject":{"entities":[{"idPattern":".*","type":"Room"}],"condition":{"attrs":["temperature"]}},"notification":{"timesSent":4,"lastNotification":"2017-07-02T10:04:35.00Z","attrs":["temperature"],"attrsFormat":"normalized","http":{"url":"http://192.0.0.1:8668/notify"},"metadata":["dateCreated","dateModified"],"lastSuccess":"2017-07-02T10:04:35.00Z"},"throttling":5}]

Any idea what could be going on? I'm happy to provide any further detail.

fgalan commented 7 years ago

Thanks for the feedback!

How Orion has been started? The output of the ps ax | grep contextBroker in the system where Orion runs will show that information.

taliaga commented 7 years ago

Right, I forgot that. See below...

   $ ps ax | grep contextBroker
   1 ?        Ssl    0:00 /usr/bin/contextBroker -fg -multiservice -logLevel DEBUG -dbhost mongo

I launch it using docker with a compose looking like this:

version: '3'

services:

  orion:
    image: fiware/orion:1.7.0
    ports:
      - "1026:1026"
    command: -logLevel DEBUG -dbhost mongo
    depends_on:
      - mongo
    healthcheck:
      test: ["CMD", "curl", "-f", "http://0.0.0.0:1026/version"]
      interval: 1m
      timeout: 10s
      retries: 3

  mongo:
    image: mongo:3.2
    ports:
      - "27017:27017"

There's something else to comment that I've noticed for the first time now. It can happen that after some notifications coming without the metada, at some point notifications come with metadata again. So, it's not that the metadata is dropped to never come back. This makes the issue less critical than I originally thought. :)

fgalan commented 7 years ago

In order to isolate the issue, could you try using -noCache at ContextBroker startup and tell us if the problem happens or not in that case, please? Thanks!

taliaga commented 7 years ago

Sorry for the silence. I've tested today a couple of times using that -noCache flag and it seems to workaround the problem; i.e, I have not seen a dropped metadata so far :)

fgalan commented 7 years ago

Thank for the feedback!

So it seems some kind of issue in the csubs cache refreshing logic. Need to be debugged and found, but I'm happy to see the problem has been narrowed quite a lot.