telefonicaid / fiware-orion

Context Broker and CEF building block for context data management, providing NGSI interfaces.
https://fiware-orion.rtfd.io/
GNU Affero General Public License v3.0
212 stars 265 forks source link

MongoCommonUpdate.cpp[236]: Runtime Error (unknown metadata value type in DB: 4) #4434

Closed FWuellhorst closed 9 months ago

FWuellhorst commented 11 months ago

Bug description

With the latest orion, we get an error during runtime:

ERROR@2023-10-05T08:07:06.839Z  MongoCommonUpdate.cpp[236]: Runtime Error (unknown metadata value type in DB: 4)

Enabling debug level does not print any useful hint on what the error is. As we don't know what causes the error, we can't provide information to reproduce it. A solution would be to hint the user to the metadata which causes this error.

Additionally, portainer indicates that the container is unhealthy and requests take multiple seconds.

fgalan commented 11 months ago

We would need some additional information.

First one is the way you run ContextBroker:

FWuellhorst commented 11 months ago

We use docker-compose:

version: "3.8"

services:

  mongo-db:
    image: mongo:4.4 #Check https://docs.mongodb.com/manual/release-notes/
    hostname: mongo-db
    container_name: "mongo-db"
    restart: always
    networks:
        - fiware
    volumes:
      - mongo-db:/data/db
       # Mongo-express is a simple visualisation of mongoDB-instances
    logging:
      driver: "json-file"
      options:
        max-file: 5
        max-size: 10m

  mongo-express:
    image: mongo-express:0.54 # Check https://docs.mongodb.com/manual/release-notes/
    hostname: mongo-express
    # Because within the same network the containers
    # are connected port exposing is not required --> only for external access
    ports:
        - "8081:8081"
    networks:
        - fiware
    environment:
      - ME_CONFIG_OPTIONS_EDITORTHEME=${ME_CONFIG_OPTIONS_EDITORTHEME:-ambiance}
      - ME_CONFIG_MONGODB_SERVER=${ME_CONFIG_MONGODB_SERVER:-mongodb}
      - ME_CONFIG_MONGODB_PORT=${ME_CONFIG_MONGODB_PORT:-27017}
      - ME_CONFIG_MONGODB_ENABLE_ADMIN=${ME_CONFIG_MONGODB_ENABLE_ADMIN:-true}
      - ME_CONFIG_MONGODB_AUTH_DATABASE=${ME_CONFIG_MONGODB_AUTH_DATABASE:-admin}

  orion:
    image: telefonicaiot/fiware-orion:latest #Sometimes the latest builds fail. Hence, we recommend to use fixed, stable versions
    hostname: orion
    restart: always
    container_name: "orion"
    depends_on:
        - mongo-db
    networks:
        - fiware
    ports:
       - "1026:1026"
    command: -dbhost mongo-db -logLevel ERROR -noCache -logForHumans
    logging:
      driver: "json-file"
      options:
        max-file: 5
        max-size: 10m

  iot-agent:
    image: fiware/iotagent-json:1.21.1
    hostname: iot-agent
    container_name: "iot-agent"
    restart: always
    depends_on:
      - mongo-db
    ports:
      - "4041:4041"
    networks:
      - fiware
    environment:
      - IOTA_CB_HOST=orion
      - IOTA_CB_PORT=1026
      - IOTA_NORTH_PORT=4041
      - IOTA_REGISTRY_TYPE=mongodb
      - IOTA_LOG_LEVEL=ERROR
      - IOTA_TIMESTAMP=true
      - IOTA_CB_NGSI_VERSION=v2
      - IOTA_AUTOCAST=false
      - IOTA_EXPLICIT_ATTRS=true
      - IOTA_MONGO_HOST=mongo-db
      - IOTA_MONGO_PORT=27017
      - IOTA_MONGO_DB=iotagentjson
      - IOTA_PROVIDER_URL=http://iot-agent:4041
      - IOTA_DEFAULT_RESOURCE=/iot/json
      - IOTA_HTTP_PORT=7896
      - IOTA_MQTT_HOST=mosquitto
      - IOTA_MQTT_PORT=1883
      - IOTA_MQTT_PROTOCOL=mqtt
      - IOTA_MQTT_KEEPALIVE=60
    logging:
      driver: "json-file"
      options:
        max-file: 5
        max-size: 10m

  mosquitto:
    image: eclipse-mosquitto:2.0.14
    restart: always
    hostname: mosquitto
    container_name: mosquitto
    ports: 
        - "1883:1883"
    networks:
        - fiware
    volumes:
        - ./mosquitto.conf:/mosquitto/config/mosquitto.conf

volumes:
  mongo-db:
    name: mongodb

networks:
  fiware:
    name: fiware
fgalan commented 11 months ago

Please, change this line:

 command: -dbhost mongo-db -logLevel ERROR -noCache -logForHumans

to

 command: -dbhost mongo-db -logLevel INFO -noCache

and repeat your tests to raise the trace again.

The -logForHumans provides simplified log messages, but removes important field that are helpful to debug. In addition, the INFO log level provides important debuging information (like the request associated to the traces).

After the redeployment, repeat your tests to raise the trace again. You will find a corr= field along with the "Runtime error". Gather all the traces messages with the same corr= and post them as comment in this issue, please.

(More info about the log format here: https://github.com/telefonicaid/fiware-orion/blob/master/doc/manuals/admin/logs.md)

FWuellhorst commented 11 months ago

Thanks for the explanation! The log is

time=2023-10-05T14:57:48.115Z | lvl=ERROR | corr=883921a8-638f-11ee-80ce-0242ac170006 | trans=1696515219-132-00001245412 | from=172.23.0.5 | srv=hil2 | subsrv=/HiL2 | comp=Orion | op=MongoCommonUpdate.cpp[236]:equalMetadataValues | msg=Runtime Error (unknown metadata value type in DB: 4)
time=2023-10-05T14:57:48.119Z | lvl=INFO | corr=883921a8-638f-11ee-80ce-0242ac170006; cbnotif=1 | trans=1696515219-132-00001248599 | from=172.23.0.5 | srv=hil2 | subsrv=/HiL2 | comp=Orion | op=logTracing.cpp[94]:logInfoMqttNotification | msg=MQTT Notif delivered (subId: 6517dec4f7e3531f770c0b3d): broker: 137.226.248.203:1883, topic: /agentlib/hil2/HiL2/ContextBrokerSubscriptions/DeviationWatchdogHiL2_WatchdogServiceToContextBrokerCommunicator/urn:ngsi-ld:roomCenter:temperatureSensor:05
time=2023-10-05T14:57:48.120Z | lvl=INFO | corr=883921a8-638f-11ee-80ce-0242ac170006; cbnotif=2 | trans=1696515219-132-00001248600 | from=172.23.0.5 | srv=hil2 | subsrv=/HiL2 | comp=Orion | op=logTracing.cpp[94]:logInfoMqttNotification | msg=MQTT Notif delivered (subId: 6517ed91f7e3531f770c0b53): broker: 137.226.248.203:1883, topic: /agentlib/hil2/HiL2/ContextBrokerSubscriptions/DeviationWatchdogHiL2_WatchdogServiceToContextBrokerCommunicator/urn:ngsi-ld:roomCenter:temperatureSensor:05
time=2023-10-05T14:57:48.121Z | lvl=INFO | corr=883921a8-638f-11ee-80ce-0242ac170006; cbnotif=3 | trans=1696515219-132-00001248601 | from=172.23.0.5 | srv=hil2 | subsrv=/HiL2 | comp=Orion | op=logTracing.cpp[94]:logInfoMqttNotification | msg=MQTT Notif delivered (subId: 651a90aff7e3531f770c0b69): broker: 137.226.248.203:1883, topic: /agentlib/hil2/HiL2/ContextBrokerSubscriptions/DeviationWatchdogHiL2_WatchdogServiceToContextBrokerCommunicator/urn:ngsi-ld:roomCenter:temperatureSensor:05
time=2023-10-05T14:57:48.123Z | lvl=INFO | corr=883921a8-638f-11ee-80ce-0242ac170006; cbnotif=4 | trans=1696515219-132-00001248602 | from=172.23.0.5 | srv=hil2 | subsrv=/HiL2 | comp=Orion | op=logTracing.cpp[94]:logInfoMqttNotification | msg=MQTT Notif delivered (subId: 651a9666f7e3531f770c0b81): broker: 137.226.248.203:1883, topic: /agentlib/hil2/HiL2/ContextBrokerSubscriptions/DeviationWatchdogHiL2_WatchdogServiceToContextBrokerCommunicator/urn:ngsi-ld:roomCenter:temperatureSensor:05
time=2023-10-05T14:57:48.123Z | lvl=INFO | corr=883921a8-638f-11ee-80ce-0242ac170006; cbnotif=5 | trans=1696515219-132-00001248603 | from=172.23.0.5 | srv=hil2 | subsrv=/HiL2 | comp=Orion | op=logTracing.cpp[94]:logInfoMqttNotification | msg=MQTT Notif delivered (subId: 651a9760f7e3531f770c0b97): broker: 137.226.248.203:1883, topic: /agentlib/hil2/HiL2/ContextBrokerSubscriptions/DeviationWatchdogHiL2_WatchdogServiceToContextBrokerCommunicator/urn:ngsi-ld:roomCenter:temperatureSensor:05
time=2023-10-05T14:57:48.124Z | lvl=INFO | corr=883921a8-638f-11ee-80ce-0242ac170006; cbnotif=6 | trans=1696515219-132-00001248604 | from=172.23.0.5 | srv=hil2 | subsrv=/HiL2 | comp=Orion | op=logTracing.cpp[94]:logInfoMqttNotification | msg=MQTT Notif delivered (subId: 651a983af7e3531f770c0bad): broker: 137.226.248.203:1883, topic: /agentlib/hil2/HiL2/ContextBrokerSubscriptions/DeviationWatchdogHiL2_WatchdogServiceToContextBrokerCommunicator/urn:ngsi-ld:roomCenter:temperatureSensor:05
time=2023-10-05T14:57:48.125Z | lvl=INFO | corr=883921a8-638f-11ee-80ce-0242ac170006; cbnotif=7 | trans=1696515219-132-00001248605 | from=172.23.0.5 | srv=hil2 | subsrv=/HiL2 | comp=Orion | op=logTracing.cpp[94]:logInfoMqttNotification | msg=MQTT Notif delivered (subId: 651be90bf7e3531f770c0bb2): broker: 137.226.248.203:1883, topic: /agentlib/hil2/HiL2/ContextBrokerSubscriptions/DeviationWatchdogHiL2_WatchdogServiceToContextBrokerCommunicator/urn:ngsi-ld:roomCenter:temperatureSensor:05
time=2023-10-05T14:57:48.126Z | lvl=INFO | corr=883921a8-638f-11ee-80ce-0242ac170006 | trans=1696515219-132-00001245412 | from=172.23.0.5 | srv=hil2 | subsrv=/HiL2 | comp=Orion | op=logTracing.cpp[167]:logInfoRequestWithPayload | msg=Request received: PATCH /v2/entities/urn:ngsi-ld:roomCenter:temperatureSensor:05/attrs?type=temperatureSensor, request payload (725 bytes): {"temperature":{"type":"Number","value":-0.25,"metadata":{"Xij":{"type":"Array","value":[0.0173,0.000289,0,0.05774,0,0,0.00289,0,0]},"TimeInstant":{"type":"DateTime","value":"2023-10-05T14:57:41.074Z"}}},"unit":{"type":"Text","value":"°C","metadata":{"TimeInstant":{"type":"DateTime","value":"2023-10-05T14:57:41.074Z"}}},"brick:hasPoint":{"type":"Relationship","value":"urn:ngsi-ld:roomCenter:temperatureSensors:01","metadata":{"TimeInstant":{"type":"DateTime","value":"2023-10-05T14:57:41.074Z"}}},"Name":{"type":"Text","value":"Temperature Room Center Climate Chamber","metadata":{"TimeInstant":{"type":"DateTime","value":"2023-10-05T14:57:41.074Z"}}},"TimeInstant":{"type":"DateTime","value":"2023-10-05T14:57:41.074Z"}}, response code: 204

If I understand correctly, the entity urn:ngsi-ld:roomCenter:temperatureSensor:05 leads to this error. The multiple notifications are an error on our side, as we debugged this for some time and updating existing subscriptions not always works. I hope they are not the reason.

Here is the entity as displayed in mongo-express. Is the metadata type array causing this issue?

{
    _id: {
        id: 'urn:ngsi-ld:roomCenter:temperatureSensor:05',
        type: 'temperatureSensor',
        servicePath: '/HiL2'
    },
    attrNames: [
        'temperature',
        'timeSeries',
        'temperatureSetpoint',
        'timeSeriesSetpoint',
        'uncertainty',
        'deviation',
        'relativeDeviation',
        'unit',
        'brick:hasPoint',
        'Name',
        'TimeInstant'
    ],
    attrs: {
        temperature: {
            value: 5.752499580383301,
            type: 'Number',
            creDate: 1692979748.474839,
            modDate: 1696518467.4206176,
            md: {
                Xij: {
                    type: 'Array',
                    value: [
                        0.0173,
                        0.000289,
                        0,
                        0.05774,
                        0,
                        0,
                        0.00289,
                        0,
                        0
                    ]
                },
                TimeInstant: {
                    type: 'DateTime',
                    value: 1696518460.6920013
                }
            },
            mdNames: [
                'Xij',
                'TimeInstant'
            ]
        },
        timeSeries: {
            value: 'http://INTERNAL_IP:3000/d/HPoxpLQPk/hil2?orgId%3D1&refresh%3D5s&viewPanel%3D38',
            type: 'Text',
            creDate: 1692979748.474839,
            modDate: 1693478039.5124874,
            mdNames: [
                'TimeInstant'
            ],
            md: {
                TimeInstant: {
                    type: 'DateTime',
                    value: 1693478038.9430008
                }
            }
        },
        temperatureSetpoint: {
            value: 2,
            type: 'Number',
            creDate: 1692979748.474839,
            modDate: 1696518467.468283,
            mdNames: [
                'TimeInstant'
            ],
            md: {
                TimeInstant: {
                    type: 'DateTime',
                    value: 1696518460.709999
                }
            }
        },
        timeSeriesSetpoint: {
              value: 'http://INTERNAL_IP:3000/d/HPoxpLQPk/hil2?orgId%3D1&refresh%3D5s&viewPanel%3D39',
            type: 'Text',
            creDate: 1692979748.474839,
            modDate: 1693478039.7216914,
            mdNames: [
                'TimeInstant'
            ],
            md: {
                TimeInstant: {
                    type: 'DateTime',
                    value: 1693478038.8829994
                }
            }
        },
        uncertainty: {
            value: null,
            type: 'Number',
            creDate: 1692979748.474839,
            modDate: 1692979748.474839,
            mdNames: []
        },
        deviation: {
            value: -1.25000006,
            type: 'Number',
            creDate: 1692979748.474839,
            modDate: 1696329366.9499297,
            mdNames: []
        },
        relativeDeviation: {
            value: -62.50000300000001,
            type: 'Number',
            creDate: 1692979748.474839,
            modDate: 1696329367.0218573,
            mdNames: []
        },
        unit: {
            value: '°C',
            type: 'Text',
            creDate: 1692979748.474839,
            modDate: 1696518467.4683075,
            mdNames: [
                'TimeInstant'
            ],
            md: {
                TimeInstant: {
                    type: 'DateTime',
                    value: 1696518460.709999
                }
            }
        },
        'brick:hasPoint': {
            value: 'urn:ngsi-ld:roomCenter:temperatureSensors:01',
            type: 'Relationship',
            creDate: 1692979748.474839,
            modDate: 1696518467.468326,
            mdNames: [
                'TimeInstant'
            ],
            md: {
                TimeInstant: {
                    type: 'DateTime',
                    value: 1696518460.709999
                }
            }
        },
        Name: {
            value: 'Temperature Room Center Climate Chamber',
            type: 'Text',
            creDate: 1692979748.474839,
            modDate: 1696518467.4683456,
            mdNames: [
                'TimeInstant'
            ],
            md: {
                TimeInstant: {
                    type: 'DateTime',
                    value: 1696518460.709999
                }
            }
        },
        TimeInstant: {
            value: 1696518460.709999,
            type: 'DateTime',
            creDate: 1692979748.474839,
            modDate: 1696518467.4683547,
            mdNames: []
        }
    },
    creDate: 1692979748.474839,
    modDate: 1696518467.4712865,
    lastCorrelator: 'eda05bdc-6390-11ee-8c75-0242ac170006'
}
fgalan commented 10 months ago

time=2023-10-05T14:57:48.126Z | lvl=INFO | corr=883921a8-638f-11ee-80ce-0242ac170006 | trans=1696515219-132-00001245412 | from=172.23.0.5 | srv=hil2 | subsrv=/HiL2 | comp=Orion | op=logTracing.cpp[167]:logInfoRequestWithPayload | msg=Request received: PATCH /v2/entities/urn:ngsi-ld:roomCenter:temperatureSensor:05/attrs?type=temperatureSensor, request payload (725 bytes): {"temperature":{"type":"Number","value":-0.25,"metadata":{"Xij":{"type":"Array","value":[0.0173,0.000289,0,0.05774,0,0,0.00289,0,0]},"TimeInstant":{"type":"DateTime","value":"2023-10-05T14:57:41.074Z"}}},"unit":{"type":"Text","value":"°C","metadata":{"TimeInstant":{"type":"DateTime","value":"2023-10-05T14:57:41.074Z"}}},"brick:hasPoint":{"type":"Relationship","value":"urn:ngsi-ld:roomCenter:temperatureSensors:01","metadata":{"TimeInstant":{"type":"DateTime","value":"2023-10-05T14:57:41.074Z"}}},"Name":{"type":"Text","value":"Temperature Room Center Climate Chamber","metadata":{"TimeInstant":{"type":"DateTime","value":"2023-10-05T14:57:41.074Z"}}},"TimeInstant":{"type":"DateTime","value":"2023-10-05T14:57:41.074Z"}}, response code: 204

Useful curl command based on this (adjust fiware-service and fiware-servicepath if needed):

curl localhost:1026/v2/entities/urn:ngsi-ld:roomCenter:temperatureSensor:05/attrs?type=temperatureSensor -X PATCH -s -S -H 'Content-Type: application/json' -H 'fiware-service: hil2' -H 'fiware-servicepath: /HiL2' -d @- <<EOF
{
    "temperature": {
        "type": "Number",
        "value": -0.25,
        "metadata": {
            "Xij": {
                "type": "Array",
                "value": [
                    0.0173,
                    0.000289,
                    0,
                    0.05774,
                    0,
                    0,
                    0.00289,
                    0,
                    0
                ]
            },
            "TimeInstant": {
                "type": "DateTime",
                "value": "2023-10-05T14:57:41.074Z"
            }
        }
    },
    "unit": {
        "type": "Text",
        "value": "°C",
        "metadata": {
            "TimeInstant": {
                "type": "DateTime",
                "value": "2023-10-05T14:57:41.074Z"
            }
        }
    },
    "brick:hasPoint": {
        "type": "Relationship",
        "value": "urn:ngsi-ld:roomCenter:temperatureSensors:01",
        "metadata": {
            "TimeInstant": {
                "type": "DateTime",
                "value": "2023-10-05T14:57:41.074Z"
            }
        }
    },
    "Name": {
        "type": "Text",
        "value": "Temperature Room Center Climate Chamber",
        "metadata": {
            "TimeInstant": {
                "type": "DateTime",
                "value": "2023-10-05T14:57:41.074Z"
            }
        }
    },
    "TimeInstant": {
        "type": "DateTime",
        "value": "2023-10-05T14:57:41.074Z"
    }
}
EOF
fgalan commented 10 months ago

I have had a look into this issue.

In the equalMetadataValues() function we have this:

    switch (getFieldF(md1, ENT_ATTRS_MD_TYPE).type())
    {
      /* FIXME #643 P6: metadata array/object are now supported, but we haven't
         implemented yet the logic to compare compounds between them
      case Object:
        ...
        break;

       case Array:
        ...
        break;
      */

    ...

    default:
      LM_E(("Runtime Error (unknown JSON type for metadata NGSI type: %d)", getFieldF(md1, ENT_ATTRS_MD_TYPE).type()));
      return false;
      break;
    }
...
  switch (getFieldF(md1, ENT_ATTRS_MD_VALUE).type())
  {
    /* FIXME not yet
    case orion::Object:
      ...
      break;

    case orion::Array:
      ...
      break;
    */

    ...

    default:
      LM_E(("Runtime Error (unknown metadata value type in DB: %d)", getFieldF(md1, ENT_ATTRS_MD_VALUE).type()));
      return false;
  }

It seems that although object or array values are supported (i.e. they can be stored and retrieved in ContextBroker entities) the comparison logic used in equalMetadataValues() doesn't take them into account.

Thus, the program flow ends in the default: case and the Runtime Error is printed.

Note that in attrValueChanges() we have a similar logic (this time for attribute values instead of metadata values):

  switch (getFieldF(attr, ENT_ATTRS_VALUE).type())
  {
  case orion::Object:
  case orion::Array:
    /* As the compoundValueP has been checked is NULL before invoking this function, finding
     * a compound value in DB means that there is a change */
    return true;

  case orion::NumberDouble:
    return caP->valueType != orion::ValueTypeNumber || caP->numberValue != getNumberFieldF(attr, ENT_ATTRS_VALUE) || forcedUpdate;

  case orion::Bool:
    return caP->valueType != orion::ValueTypeBoolean || caP->boolValue != getBoolFieldF(attr, ENT_ATTRS_VALUE) || forcedUpdate;

  case orion::String:
    return caP->valueType != orion::ValueTypeString || caP->stringValue != getStringFieldF(attr, ENT_ATTRS_VALUE) || forcedUpdate;

  case orion::jstNULL:
    return caP->valueType != orion::ValueTypeNull;

  default:
    LM_E(("Runtime Error (unknown attribute value type in DB: %d on attribute %s)", getFieldF(attr, ENT_ATTRS_VALUE).type(), caP->name.c_str()));
    return false;
  }

In this case, we have a case for arrays and object, so we don't get the Runtime Error (although the lack of support is similar).

In some sense, this problem is another "view" of issue https://github.com/telefonicaid/fiware-orion/issues/4211 and the solution would be similar: a deep-compare JSON mechanism.

FWuellhorst commented 10 months ago

Thanks for debugging this! Is the function equalMetadataValues() only called when the subscription is triggered? Or every time i get the entity via http?

fgalan commented 10 months ago

Thanks for debugging this! Is the function equalMetadataValues() only called when the subscription is triggered? Or every time i get the entity via http?

I would say that it is used only in the subscription triggering logic so when getting entity via HTTP it is not invoked. Anyway, it is easy to check: just get the entity via HTTP and check if you get the Runtime Error in the logs or not.

fgalan commented 10 months ago

Look for this in the code:

// FIXME #643 P6: in the case of compound value, it's more difficult to know if an attribute
// has really changed its value (many levels have to be traversed). Until we can develop the
// matching logic, we consider CHANGE_VALUE_AND_MD always.
fgalan commented 10 months ago

Related issue https://github.com/telefonicaid/fiware-orion/issues/643 (probably, a duplicate)

fgalan commented 10 months ago

PR https://github.com/telefonicaid/fiware-orion/pull/4444

fgalan commented 10 months ago

A fix for this problem has been provided (in master branch by the time being). The version of Orion available in telefonicaiot/fiware-orion:latest at dockerhub includes this fix.

@FWuellhorst it would be great if you could test with that version an provide feedback (either if the issue is solver or not), please. Thanks!

FWuellhorst commented 9 months ago

Thanks for the quick fix! We tested it and it works, thanks!

fgalan commented 9 months ago

@FWuellhorst thank you for the feedback!

Based in your positive report, this issue will be closed.