legrego / homeassistant-elasticsearch

Publish Home-Assistant events to Elasticsearch
https://legrego.github.io/homeassistant-elasticsearch/
MIT License
143 stars 38 forks source link

"Error publishing documents to Elasticsearch" with new Datastreams #261

Closed strawgate closed 1 month ago

strawgate commented 1 month ago

I believe this error doesn't impact functionality as the document is already ingested but for some reason a second document with the exact same timestamp (down to the nanosecond) and object_id is being sent in the batch.

We should figure out what's causing this.

Error publishing documents to Elasticsearch: ('1 document(s) failed to index.', [{'create': {'_index': '.ds-metrics-homeassistant.sensor-default-2024.04.07-000003', '_id': 'kz1MH8suutLTy_7yAAABjtL-arM', 'status': 409, 'error': {'type': 'version_conflict_engine_exception', 'reason': '[kz1MH8suutLTy_7yAAABjtL-arM][JIcv3d_fYDv7FYtF5TdkWNDW8roSBOFIkL7bIaESMwAAamVJLQ@2024-04-12T15:47:52.627Z]: version conflict, document already exists (current version [1])', 'index_uuid': 'D0h9qsoPQJOOPuInvmDjvg', 'shard': '0', 'index': '.ds-metrics-homeassistant.sensor-default-2024.04.07-000003'}, 'data': {'@timestamp': datetime.datetime(2024, 4, 12, 15, 47, 52, 627825, tzinfo=datetime.timezone.utc), 'hass.object_id': 'upstairs_temperature_airthings_ecobee', 'hass.entity': {'id': 'sensor.upstairs_temperature_airthings_ecobee', 'domain': 'sensor', 'attributes': {'state_class': <SensorStateClass.MEASUREMENT: 'measurement'>, 'unit_of_measurement': <UnitOfTemperature.FAHRENHEIT: '°F'>, 'icon': 'mdi:calculator', 'friendly_name': 'Upstairs Temperature Airthings + Ecobee'}, 'value': '81.05', 'device': {}, 'platform': 'min_max', 'valueas': {'float': 81.05}, 'geo.location': {'lat': 43.0701139, 'lon': -89.39148068901443}}, 'agent.name': 'My Home Assistant', 'agent.type': 'hass', 'ecs.version': '1.0.0', 'host.geo.location': {'lat': 43.0701139, 'lon': -89.39148068901443}, 'tags': None, 'agent.version': '2024.4.2', 'host.architecture': 'aarch64', 'host.os.name': 'Linux', 'host.hostname': 'homeassistant'}}}])
legrego commented 1 month ago

I found an example of two documents that were published at almost identical times. Their contents were identical except for the document id and timestamp. So I suspect there are cases where entities publish updates in rapid succession, even if nothing actually changed.

First event

{
  "_index": ".ds-metrics-homeassistant.sensor-default-2024.04.11-000001",
  "_id": "r4YjX9hiovljvTx4AAABjtOcX_c",
  "_version": 1,
  "_score": 0,
  "_ignored": [
    "hass.entity.attributes.unit_of_measurement.float",
    "hass.entity.attributes.state_class.float",
    "hass.entity.attributes.icon.float",
    "hass.entity.attributes.device_class.float",
    "hass.entity.attributes.message_type.float",
    "hass.entity.attributes.friendly_name.float"
  ],
  "_source": {
    "@timestamp": "2024-04-12T18:40:24.567Z",
    "agent": {
      "name": "My Home Assistant",
      "type": "hass",
      "version": "2024.4.2"
    },
    "ecs": {
      "version": "1.0.0"
    },
    "hass": {
      "entity": {
        "attributes": {
          "checksumval": "63268",
          "consumption": "458456",
          "device_class": "gas",
          "friendly_name": "gas_meter",
          "icon": "mdi:gas_canister",
          "id": "15024475",
          "message_type": "SCM",
          "state_class": "total_increasing",
          "tamperenc": "0",
          "tamperphy": "0",
          "type": "12",
          "unit_of_measurement": "ft³"
        },
        "domain": "sensor",
        "geo": {
          "location": {
            "lat": x,
            "lon": x
          }
        },
        "id": "sensor.gas_meter",
        "platform": "mqtt",
        "value": "16190.221",
        "valueas": {
          "float": 16190.221
        }
      },
      "object_id": "gas_meter"
    },
    "host": {
      "architecture": "aarch64",
      "geo": {
        "location": {
          "lat": x,
          "lon": x
        }
      },
      "hostname": "homeassistant",
      "os": {
        "name": "Linux"
      }
    }
  },
  "fields": {
    "hass.entity.attributes.unit_of_measurement.keyword": [
      "ft³"
    ],
    "hass.entity.domain": [
      "sensor"
    ],
    "hass.entity.attributes.device_class.keyword": [
      "gas"
    ],
    "hass.entity.attributes.friendly_name.keyword": [
      "gas_meter"
    ],
    "host.hostname": [
      "homeassistant"
    ],
    "hass.entity.attributes.consumption.float": [
      458456
    ],
    "hass.entity.attributes.tamperphy.keyword": [
      "0"
    ],
    "host.os.name": [
      "Linux"
    ],
    "agent.name": [
      "My Home Assistant"
    ],
    "hass.entity.attributes.tamperphy": [
      "0"
    ],
    "hass.entity.platform": [
      "mqtt"
    ],
    "hass.entity.attributes.icon": [
      "mdi:gas_canister"
    ],
    "hass.entity.attributes.friendly_name": [
      "gas_meter"
    ],
    "hass.entity.valueas.float": [
      16190.221
    ],
    "hass.entity.id": [
      "sensor.gas_meter"
    ],
    "hass.entity.attributes.tamperenc.float": [
      0
    ],
    "hass.entity.attributes.message_type.keyword": [
      "SCM"
    ],
    "hass.entity.attributes.state_class": [
      "total_increasing"
    ],
    "host.architecture": [
      "aarch64"
    ],
    "hass.entity.attributes.id.float": [
      15024475
    ],
    "ecs.version": [
      "1.0.0"
    ],
    "hass.entity.attributes.checksumval": [
      "63268"
    ],
    "hass.object_id": [
      "gas_meter"
    ],
    "agent.version": [
      "2024.4.2"
    ],
    "hass.entity.attributes.state_class.keyword": [
      "total_increasing"
    ],
    "host.geo.location": [
      {
        "coordinates": [
          x,
          x
        ],
        "type": "Point"
      }
    ],
    "hass.entity.attributes.id.keyword": [
      "15024475"
    ],
    "hass.entity.attributes.device_class": [
      "gas"
    ],
    "hass.entity.attributes.message_type": [
      "SCM"
    ],
    "hass.entity.attributes.tamperenc": [
      "0"
    ],
    "hass.entity.attributes.id": [
      "15024475"
    ],
    "hass.entity.attributes.type.float": [
      12
    ],
    "hass.entity.attributes.type.keyword": [
      "12"
    ],
    "agent.type": [
      "hass"
    ],
    "hass.entity.attributes.consumption.keyword": [
      "458456"
    ],
    "hass.entity.attributes.type": [
      "12"
    ],
    "hass.entity.attributes.consumption": [
      "458456"
    ],
    "hass.entity.value.keyword": [
      "16190.221"
    ],
    "hass.entity.attributes.tamperphy.float": [
      0
    ],
    "hass.entity.value": [
      "16190.221"
    ],
    "hass.entity.attributes.icon.keyword": [
      "mdi:gas_canister"
    ],
    "hass.entity.attributes.checksumval.keyword": [
      "63268"
    ],
    "@timestamp": [
      "2024-04-12T18:40:24.567Z"
    ],
    "hass.entity.attributes.unit_of_measurement": [
      "ft³"
    ],
    "hass.entity.attributes.checksumval.float": [
      63268
    ],
    "hass.entity.geo.location": [
      {
        "coordinates": [
          x,
          x
        ],
        "type": "Point"
      }
    ],
    "hass.entity.attributes.tamperenc.keyword": [
      "0"
    ]
  },
  "ignored_field_values": {
    "hass.entity.attributes.unit_of_measurement.float": [
      "ft³"
    ],
    "hass.entity.attributes.state_class.float": [
      "total_increasing"
    ],
    "hass.entity.attributes.message_type.float": [
      "SCM"
    ],
    "hass.entity.attributes.icon.float": [
      "mdi:gas_canister"
    ],
    "hass.entity.attributes.device_class.float": [
      "gas"
    ],
    "hass.entity.attributes.friendly_name.float": [
      "gas_meter"
    ]
  }
}

Second event

{
  "_index": ".ds-metrics-homeassistant.sensor-default-2024.04.11-000001",
  "_id": "r4YjX9hiovljvTx4AAABjtOcX_A",
  "_version": 1,
  "_score": 0,
  "_ignored": [
    "hass.entity.attributes.unit_of_measurement.float",
    "hass.entity.attributes.state_class.float",
    "hass.entity.attributes.icon.float",
    "hass.entity.attributes.device_class.float",
    "hass.entity.attributes.message_type.float",
    "hass.entity.attributes.friendly_name.float"
  ],
  "_source": {
    "@timestamp": "2024-04-12T18:40:24.560Z",
    "agent": {
      "name": "My Home Assistant",
      "type": "hass",
      "version": "2024.4.2"
    },
    "ecs": {
      "version": "1.0.0"
    },
    "hass": {
      "entity": {
        "attributes": {
          "checksumval": "63268",
          "consumption": "458456",
          "device_class": "gas",
          "friendly_name": "gas_meter",
          "icon": "mdi:gas_canister",
          "id": "15024475",
          "message_type": "SCM",
          "state_class": "total_increasing",
          "tamperenc": "0",
          "tamperphy": "0",
          "type": "12",
          "unit_of_measurement": "ft³"
        },
        "domain": "sensor",
        "geo": {
          "location": {
            "lat": x,
            "lon": x
          }
        },
        "id": "sensor.gas_meter",
        "platform": "mqtt",
        "value": "16190.221",
        "valueas": {
          "float": 16190.221
        }
      },
      "object_id": "gas_meter"
    },
    "host": {
      "architecture": "aarch64",
      "geo": {
        "location": {
          "lat": x,
          "lon": x
        }
      },
      "hostname": "homeassistant",
      "os": {
        "name": "Linux"
      }
    }
  },
  "fields": {
    "hass.entity.attributes.unit_of_measurement.keyword": [
      "ft³"
    ],
    "hass.entity.domain": [
      "sensor"
    ],
    "hass.entity.attributes.device_class.keyword": [
      "gas"
    ],
    "hass.entity.attributes.friendly_name.keyword": [
      "gas_meter"
    ],
    "host.hostname": [
      "homeassistant"
    ],
    "hass.entity.attributes.consumption.float": [
      458456
    ],
    "hass.entity.attributes.tamperphy.keyword": [
      "0"
    ],
    "host.os.name": [
      "Linux"
    ],
    "agent.name": [
      "My Home Assistant"
    ],
    "hass.entity.attributes.tamperphy": [
      "0"
    ],
    "hass.entity.platform": [
      "mqtt"
    ],
    "hass.entity.attributes.icon": [
      "mdi:gas_canister"
    ],
    "hass.entity.attributes.friendly_name": [
      "gas_meter"
    ],
    "hass.entity.valueas.float": [
      16190.221
    ],
    "hass.entity.id": [
      "sensor.gas_meter"
    ],
    "hass.entity.attributes.tamperenc.float": [
      0
    ],
    "hass.entity.attributes.message_type.keyword": [
      "SCM"
    ],
    "hass.entity.attributes.state_class": [
      "total_increasing"
    ],
    "host.architecture": [
      "aarch64"
    ],
    "hass.entity.attributes.id.float": [
      15024475
    ],
    "ecs.version": [
      "1.0.0"
    ],
    "hass.entity.attributes.checksumval": [
      "63268"
    ],
    "hass.object_id": [
      "gas_meter"
    ],
    "agent.version": [
      "2024.4.2"
    ],
    "hass.entity.attributes.state_class.keyword": [
      "total_increasing"
    ],
    "host.geo.location": [
      {
        "coordinates": [
          x,
          x
        ],
        "type": "Point"
      }
    ],
    "hass.entity.attributes.id.keyword": [
      "15024475"
    ],
    "hass.entity.attributes.device_class": [
      "gas"
    ],
    "hass.entity.attributes.message_type": [
      "SCM"
    ],
    "hass.entity.attributes.tamperenc": [
      "0"
    ],
    "hass.entity.attributes.id": [
      "15024475"
    ],
    "hass.entity.attributes.type.float": [
      12
    ],
    "hass.entity.attributes.type.keyword": [
      "12"
    ],
    "agent.type": [
      "hass"
    ],
    "hass.entity.attributes.consumption.keyword": [
      "458456"
    ],
    "hass.entity.attributes.type": [
      "12"
    ],
    "hass.entity.attributes.consumption": [
      "458456"
    ],
    "hass.entity.value.keyword": [
      "16190.221"
    ],
    "hass.entity.attributes.tamperphy.float": [
      0
    ],
    "hass.entity.value": [
      "16190.221"
    ],
    "hass.entity.attributes.icon.keyword": [
      "mdi:gas_canister"
    ],
    "hass.entity.attributes.checksumval.keyword": [
      "63268"
    ],
    "@timestamp": [
      "2024-04-12T18:40:24.560Z"
    ],
    "hass.entity.attributes.unit_of_measurement": [
      "ft³"
    ],
    "hass.entity.attributes.checksumval.float": [
      63268
    ],
    "hass.entity.geo.location": [
      {
        "coordinates": [
          x,
          x
        ],
        "type": "Point"
      }
    ],
    "hass.entity.attributes.tamperenc.keyword": [
      "0"
    ]
  },
  "ignored_field_values": {
    "hass.entity.attributes.unit_of_measurement.float": [
      "ft³"
    ],
    "hass.entity.attributes.state_class.float": [
      "total_increasing"
    ],
    "hass.entity.attributes.message_type.float": [
      "SCM"
    ],
    "hass.entity.attributes.icon.float": [
      "mdi:gas_canister"
    ],
    "hass.entity.attributes.device_class.float": [
      "gas"
    ],
    "hass.entity.attributes.friendly_name.float": [
      "gas_meter"
    ]
  }
}

The diff

<   "_id": "r4YjX9hiovljvTx4AAABjtOcX_c",
---
>   "_id": "r4YjX9hiovljvTx4AAABjtOcX_A",
15c15
<     "@timestamp": "2024-04-12T18:40:24.567Z",
---
>     "@timestamp": "2024-04-12T18:40:24.560Z",
204c204
<       "2024-04-12T18:40:24.567Z"
---
>       "2024-04-12T18:40:24.560Z"
strawgate commented 1 month ago

@legrego it seems like our options are either:

  1. Switch to nanosecond timestamps
  2. Pick one of the docs to publish

Any preference?

I think the limitations of nanosecond timestamps are:

  1. More storage required
  2. The nanosecond-based field mapper is only able to store dates between 1970 and 2262
  3. Aggregation buckets will be in millisecond resolution, even if you query a field of type date_nanos

I dont really think any of these are a big deal so I vote for nanosecond timestamps

legrego commented 1 month ago

@strawgate Let's give nanosecond timestamps a try. Those limitations seem reasonable to me, and it saves us the additional complexity & cost of tracking these conflicts.