telefonicaid / fiware-cygnus

A connector in charge of persisting context data sources into other third-party databases and storage systems, creating a historical view of the context
https://fiware-cygnus.rtfd.io/
GNU Affero General Public License v3.0
64 stars 104 forks source link

Cygnus merges notifications when receiving a large volume of them in a short period of time. #1501

Open dmabtrg opened 6 years ago

dmabtrg commented 6 years ago

Today I've been hit with what seems to be a weird bug in Cygnus. A little background first:

We have a device that instead of sending the measurements as they are collected, it accumulates them (every ~10 minutes, including the time of the measurement) and then delivers all those measurements in a batch at specific points in time (every several hours). This batches are sent to the iotagent-json, that updates the matching entity in Orion and a notification is sent to Cygnus for every measurement of the batch. So far, so good. Cygnus is configured with PostgreSQL and Hadoop as backends. Orion notifications have been configured with no throttling, as we want to store every measurement.

For the PostgreSQL backend we have configured:

cygnus-ngsi.sinks.postgresql-sink.attr_persistence = row
cygnus-ngsi.sinks.postgresql-sink.data_model = dm-by-service-path

As one of the attributes is the date of the measurement, we want to obtain all the attributes in the database that were measured at that date. Then I thought that getting all the attributes that matches the recvtimets of the date attribute would give me that. Then i saw something like this:

postgres=# select * from test.sensors where recvtimets='1529406607326' and attrname='measurement_date';
  recvtimets   |         recvtime         | fiwareservicepath | entityid | entitytype |     attrname     | attrtype |   attrvalue   |                                                                                                                attrmd
---------------+--------------------------+-------------------+----------+------------+------------------+----------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1529406607326 | 2018-06-19T11:10:07.326Z | /sensors          | sensor01 | mysensor   | measurement_date | string   | 1529385000000 | [{"name":"TimeInstant","type":"ISO8601","value":"2018-06-19T11:10:05.451Z"},{"name":"formato","type":"string","value":"epoch"},{"name":"nombre","type":"string","value":"fecha/hora"},{"name":"unidad","type":"string","value":"ms"}]
 1529406607326 | 2018-06-19T11:10:07.326Z | /sensors          | sensor01 | mysensor   | measurement_date | string   | 1529377200000 | [{"name":"TimeInstant","type":"ISO8601","value":"2018-06-19T11:10:05.473Z"},{"name":"formato","type":"string","value":"epoch"},{"name":"nombre","type":"string","value":"fecha/hora"},{"name":"unidad","type":"string","value":"ms"}]
 1529406607326 | 2018-06-19T11:10:07.326Z | /sensors          | sensor01 | mysensor   | measurement_date | string   | 1529386800000 | [{"name":"TimeInstant","type":"ISO8601","value":"2018-06-19T11:10:05.463Z"},{"name":"formato","type":"string","value":"epoch"},{"name":"nombre","type":"string","value":"fecha/hora"},{"name":"unidad","type":"string","value":"ms"}]
  (3 rows)

Three different measurements had the same recvtimets. First I thought it had something to do with the iotagent, but every measurement was sent independently and the log shows no errors. Checking Orion, everything was ok too. Then I checked the log for Cygnus and there were one notification received for each of the measurements, so ok too. Checking the data stored in the Hadoop backend, it shows the same problem.

So somewhere between Cygnus receiving the notifications and Cygnus storing them in the backend, it merges some of them as if they were received as a single notification.

I've been able to reproduce this behaviour with the following setup:

Using the sensor from the step by step guide I then sent temperature measurements with:

for i in $(seq 1 200) ; do mosquitto_pub -t /1234/sensor01/attrs -m "{\"t\": $i}" -u admin -P admin ; done

And checking the database, some notifications have been merged:

postgres=# select * from myhome.environment where recvtimets='1529934582282';
  recvtimets   |         recvtime         | fiwareservicepath |     entityid     | entitytype  |  attrname   | attrtype | attrvalue |                                    attrmd
---------------+--------------------------+-------------------+------------------+-------------+-------------+----------+-----------+------------------------------------------------------------------------------
 1529934582282 | 2018-06-25T13:49:42.282Z | /environment      | LivingRoomSensor | multiSensor | Temperature | celsius  | 177       | [{"name":"TimeInstant","type":"ISO8601","value":"2018-06-25T13:49:41.913Z"}]
 1529934582282 | 2018-06-25T13:49:42.282Z | /environment      | LivingRoomSensor | multiSensor | Temperature | celsius  | 169       | [{"name":"TimeInstant","type":"ISO8601","value":"2018-06-25T13:49:41.856Z"}]
(2 rows)

I've been searching, but have been unable to find anything related to this bug.

Thanks.

Regards.

AlvaroVega commented 6 years ago

Cygnus by default works in batch mode. You can try to reduce batch size (CYGNUS_POSTGRESQL_BATCH_SIZE) or decrease timeout (CYGNUS_POSTGRESQL_BATCH_TIMEOUT) in order to that different measures would be stored with different recvtimet.

dmabtrg commented 6 years ago

I think the recvtimets is generated when converting the notification into a NGSIEvent object and not at the sink batch processing stage, before putting it into the channel. Anyway, I tried your suggestion and there was no difference in the results, but setting the timeout to 0 seems to disable the batch processing.

Thanks.

Regards.

fgalan commented 6 years ago

Could you try using MySQL as storage backend instead of PostgreSQL? Just to check if the behaviour is the same (in which case, maybe the problem is in the common event handler) or if in that case is correct (in which case, the problem should be in the PostgreSQL sink).

dmabtrg commented 6 years ago

I'm also using the HDFS backend and the same problem is present there, with the exact same values. Here's an example:

fgalan commented 6 years ago

Having confirmed the save behaviour in the HDFS storage backend, probably makes the additional test on MySQL unnecessary. Although it is also true that the most situations in which the issues is verified, the better. In summary, as you prefer :)

dmabtrg commented 6 years ago

I've tested it, just in case. Same results:

mysql> select * from environment_sensors_LivingRoomSensor_multiSensor where recvtimets = '1530197123903';
+---------------+-------------------------+----------------------+------------------+-------------+-------------+----------+-----------+------------------------------------------------------------------------------+
| recvTimeTs    | recvTime                | fiwareServicePath    | entityId         | entityType  | attrName    | attrType | attrValue | attrMd                                                                       |
+---------------+-------------------------+----------------------+------------------+-------------+-------------+----------+-----------+------------------------------------------------------------------------------+
| 1530197123903 | 2018-06-28T14:45:23.903 | /environment/sensors | LivingRoomSensor | multiSensor | Temperature | celsius  | 14        | [{"name":"TimeInstant","type":"ISO8601","value":"2018-06-28T14:45:23.722Z"}] |
| 1530197123903 | 2018-06-28T14:45:23.903 | /environment/sensors | LivingRoomSensor | multiSensor | Temperature | celsius  | 10        | [{"name":"TimeInstant","type":"ISO8601","value":"2018-06-28T14:45:23.721Z"}] |
+---------------+-------------------------+----------------------+------------------+-------------+-------------+----------+-----------+------------------------------------------------------------------------------+
2 rows in set (0.00 sec)

PostgreSQL and HDFS have the same values.

fgalan commented 6 years ago

Which Cygnus version are you using? The one comming from master branch?

dmabtrg commented 6 years ago

I'm using the latest docker image available in docker hub (this build):

$ docker images --no-trunc | grep cygnus | grep latest
fiware/cygnus-ngsi                               latest              sha256:7b834ce9560d712a200d08a1c9e05962e5fb6f9b1d1926a1765816550cc38994   39 hours ago        516MB

From the container log:

$ docker-compose logs cygnus | grep version
cygnus_1                  | time=2018-06-29T07:14:27.816Z | lvl=INFO | corr=N/A | trans=N/A | srv=N/A | subsrv=N/A | comp=N/A | op=main | msg=com.telefonica.iot.cygnus.nodes.CygnusApplication[171] : Starting Cygnus, version 1.9.0_SNAPSHOT.5e63327cecadb2dfc650b83ca3c7a60ae5968bb9
vjain641 commented 5 years ago

I've also been hit by this bug. I'm using HDFS sink & sending two entities with different id & type to Orion to which Cygnus is subscribed. But Cygnus is merging both notifications & storing the data in the same text file in hdfs. I'm using Cygnus V1.10.0