legrego / homeassistant-elasticsearch

Publish Home-Assistant events to Elasticsearch
https://legrego.github.io/homeassistant-elasticsearch/
MIT License
145 stars 38 forks source link

Support indexing via data streams #130

Closed legrego closed 3 months ago

legrego commented 3 years ago

The information that we publish to Elasticsearch is a great fit for Data Streams.

We should investigate what it would take to add support for data streams.

Data Streams was introduced in version 7.9 (I believe), so we wouldn't be able to always use them unless we dropped support for older cluster versions, which I'm hesitant to do.

ruflin commented 3 years ago

++ on going for data streams and directly jump on the new naming scheme: https://www.elastic.co/blog/an-introduction-to-the-elastic-data-stream-naming-scheme (yes, I'm biased). I guess this would all fall under metrics as the type and we should decide on the dataset name(s). Is it all a single dataset or should there be multiple different ones?

legrego commented 3 years ago

@ruflin I think metrics makes sense for this component. At this point, I think a single dataset is all we'd need.

What do you think about any of these options?

ruflin commented 3 years ago

++ on metrics. I like hass.events as it keeps room for other hass.* data. hass-events would be using events as the namespace which is unexpected.

One thing I'm stumbling over is hass prefix instead of home_assisstant or ha. I'm still new to Home Assistant and I just run it in a docker container without hass as the OS. I'm a bit confused here on what is what and what is the correct naming so I wonder if hass is the correct prefix for all ways home assistant is run?

legrego commented 3 years ago

One thing I'm stumbling over is hass prefix instead of home_assisstant or ha. I'm still new to Home Assistant and I just run it in a docker container without hass as the OS. I'm a bit confused here on what is what and what is the correct naming so I wonder if hass is the correct prefix for all ways home assistant is run?

I picked hass arbitrarily in the past, so I'm open to renaming this. home_assistant seems like a decent name. So would this make the new proposal metrics-home_assistant.events-default?

ruflin commented 3 years ago

@legrego LGTM 👍

ruflin commented 11 months ago

I'm coming back to this as I just stumbled today over the alias setup etc. I wonder if we could introduce a new config (which becomes the default for new installations) but keep the old setups working? Instead of home assistant installing the templates manually, I would switch over to an integration package having the templates etc. inside (see https://github.com/ruflin/ruflin-integration-package for inspiration). All the elasticsearch integration would do is push the zip file for installation.

The default data stream would be metrics-home_assistant.events-default. Some of the things we ship could also be more similar to log events? On my end, I still need to dig deeper into the code to fully understand what is shipped from where.

@legrego WDYT about the high level approach above?

legrego commented 11 months ago

I'm coming back to this as I just stumbled today over the alias setup etc. I wonder if we could introduce a new config (which becomes the default for new installations) but keep the old setups working? Instead of home assistant installing the templates manually, I would switch over to an integration package having the templates etc. inside (see https://github.com/ruflin/ruflin-integration-package for inspiration). All the elasticsearch integration would do is push the zip file for installation.

The default data stream would be metrics-home_assistant.events-default.

@ruflin I like this approach quite a bit. I'll see if I can find some time to play with the integration package concept and get a working POC.

Some of the things we ship could also be more similar to log events? On my end, I still need to dig deeper into the code to fully understand what is shipped from where.

I'd love to see if there's a way for us to hook into home assistant's logger. Tapping into that would give us a proper dataset for logs-home_assistant.???.default

legrego commented 9 months ago

I would switch over to an integration package having the templates etc. inside (see https://github.com/ruflin/ruflin-integration-package for inspiration). All the elasticsearch integration would do is push the zip file for installation.

@ruflin Can you help me understand the benefits of this approach over installing the templates (etc.) manually? A couple I can see:

Are there others? I really don't love the idea of asking users for both their Kibana & ES endpoints in order to complete setup. You mentioned that we could streamline this a bit for Cloud, but that would add yet another flow (== more complexity).

Do you know if there are plans to expose an ES API for package installation in the future? That would make this approach much more paletable to me.

ruflin commented 9 months ago

The package installation takes care of all the edge cases, roll overs etc. and allows you to package additional assets like dashboards etc. into it. It also means you get things like ECS templates etc. directly out of the box. There is lots of small additional things that happen during installation to optimise things.

Do you know if there are plans to expose an ES API for package installation in the future? I would love to have one but I doubt it will happen any time soon.

asking users for both their Kibana & ES endpoints

I hear you. I wonder if we could turn it around. Only ask for Kibana endpoint and then ask Kibana for the ES endpoint assuming this is possible?

legrego commented 9 months ago

The package installation takes care of all the edge cases, roll overs etc. and allows you to package additional assets like dashboards etc. into it. It also means you get things like ECS templates etc. directly out of the box. There is lots of small additional things that happen during installation to optimise things.

Thanks!

I hear you. I wonder if we could turn it around. Only ask for Kibana endpoint and then ask Kibana for the ES endpoint assuming this is possible?

I was wondering this as well. I don't think there's a way to reliably get this information from Kibana

ruflin commented 9 months ago

I was wondering this as well. I don't think there's a way to reliably get this information from Kibana

We could add it ;-)

strawgate commented 3 months ago

Fixed by: https://github.com/legrego/homeassistant-elasticsearch/pull/207