legrego / homeassistant-elasticsearch

Publish Home-Assistant events to Elasticsearch
https://legrego.github.io/homeassistant-elasticsearch/
MIT License
145 stars 38 forks source link

Rollover defaults are too aggressive #35

Closed legrego closed 5 years ago

legrego commented 5 years ago

The current defaults for index rollovers are too aggressive. This results in indices that are too small. The default rollover_size should be set to 30gb, and both rollover_age and rollover_docs should be initially unset.

jakommo commented 5 years ago

In general I'm :+1: on this as 30gb usually is a good size for a shard with this type of data. I'm wondering though how long it will take to reach 30gb. If it takes a year, then it might make things more complicated, i.e. if a mapping change is necessary

legrego commented 5 years ago

That's a good point...I imagine it would take quite some time to hit the 30gb mark, for most installations. For my own indices, I'm seeing ~150mb per 1 million documents

image

legrego commented 5 years ago

pinging @dsztykman, since you had the original suggestion for 30gb. Do you have any thoughts on this?

dsztykman commented 5 years ago

Agreed it's a bit of an issue, we should maybe think of a new indices names with versioning included like hass-events-v1-XXX and then whenever we change the mapping we create a new version and with aliases we can manage this way. Otherwise we're going to end up with too many shards and too little data and the performance is going to suffer in the long term. The question is how do we detect a change in mapping from home assistant directly? Like adding a new device ?

legrego commented 5 years ago

I like the idea of versioning the index names.

The question is how do we detect a change in mapping from home assistant directly? Like adding a new device ?

My goal for the index mapping is to be device agnostic. It shouldn't care which devices are registered, or how many devices exist. I think the only times the mapping should change are:

1) Defects in the mapping. I've encountered this as more people use the plugin with various configurations. See also #32 2) ES version compatability: if supporting the next major version requires changes to the mapping 3) Enhancement requests

So essentially, I only expect the mapping to change if this plugin requires it, or if Elasticsearch requires it. The individual installations shouldn't have any bearing on the mapping, so we should only have to bump the version as a result of a changes to this plugin.

dsztykman commented 5 years ago

Make sense so essentially change the version and the mapping when we receive an error from ES

legrego commented 5 years ago

A bit off-topic for this PR, but if you're interested in seeing/reviewing the versioning idea, I have a PR up for #32 which incorporates this: https://github.com/legrego/homeassistant-elasticsearch/pull/40