Closed strawgate closed 2 months ago
Also adding https://github.com/legrego/homeassistant-elasticsearch/issues/124 to pull each domain into its own data stream
I think using tsdb, the new time series database, for the events makes sense, I just need to figure out an appropriate set of default dimensions first
Why do a bunch of the hass.properties have ignore_above 124 on them?
When this was written, my knowledge of both ES and Home Assistant were quite lacking. I didn't know what data would exist in the wild for these fields, not how ES would handle them. ignore_above
was an attempt to make this component resilient to this unknown data and lack of personal knowledge.
Why does the index template have an event.* mapping? copy/paste?
Yeah this was copy/paste from the original ECS 1.0 specification. I don't believe we use this, so we don't have to carry it forward.
What to do when "All Entities" is picked with Datastreams
Can you elaborate on this one?
Why do a bunch of the hass.properties have ignore_above 124 on them?
When this was written, my knowledge of both ES and Home Assistant were quite lacking. I didn't know what data would exist in the wild for these fields, not how ES would handle them.
ignore_above
was an attempt to make this component resilient to this unknown data and lack of personal knowledge.
Ok, I think a bunch of the fields marked as 124 actually cap out at 255, if there's no concern i'll just find/replace 124 to 1024 across the board which is typica for other integrations.
Why does the index template have an event.* mapping? copy/paste?
Yeah this was copy/paste from the original ECS 1.0 specification. I don't believe we use this, so we don't have to carry it forward.
Perfect -- i need to remove it to support TSDS / Synthetic source
What to do when "All Entities" is picked with Datastreams
Can you elaborate on this one?
I misunderstood what All entities
was for so i'll remove that from the open questions.
I've added the choice during integration setup. Datastreams would be the default setting, existing users would be undisturbed, migrating to datastreams would require recreating the integration.
I could probably make it configurable via options and have it reload the integration when the type changes but every second I spend messing with the config/options flow I feel like i'm losing a valuable piece of myself.
I am able to modify an existing all-hass-events dataview to include the new datastreams like so:
and am then able to query both sets of data via that dataview:
Another thought I had, perhaps as a follow-up PR was to use the python type of the entity.value
to also populate strongly typed ES fields like entity.valueas.float
entity.valueas.int
entity.valueas.date
entity.valueas.keyword
entity.valueas.boolean
This would make it much easier to visualize sensor data
Any thoughts on something like that?
Edit: implemented
Latest commit should auto-pick datastreams on serverless, autopick legacy indices on pre-8.7 and offer legacy indices or datastreams on stateful >=8.7
Another thought I had, perhaps as a follow-up PR was to use the python type of the
entity.value
to also populate strongly typed ES fields likeentity.valueas.float
entity.valueas.int
entity.valueas.date
entity.valueas.keyword
entity.valueas.boolean
This would make it much easier to visualize sensor data
Any thoughts on something like that?
Edit: implemented
@strawgate that's an interesting thought. Can you help me understand the benefits of this approach over the existing approach, or why we would benefit from maintaining both as opposed to one or the other?
If you can't tell, I'm not an ingest expert, so I'm relying on your expertise here.
@strawgate that's an interesting thought. Can you help me understand the benefits of this approach over the existing approach, or why we would benefit from maintaining both as opposed to one or the other?
So here's a couple areas I think this is beneficial:
So I would propose keeping .value
and offering optional coerced .valueas
types.
This has maintenance downsides -- we may find that python types don't strictly match elasticsearch types and have to make small adjustments to the coercion along the way but these should not block indexing and would just result in the user being left with just the value
field for that particular document, which ultimately is just the current behavior without this change.
@strawgate thank you for the detailed response, I found it very helpful.
This has maintenance downsides -- we may find that python types don't strictly match elasticsearch types and have to make small adjustments to the coercion along the way but these should not block indexing and would just result in the user being left with just the value field for that particular document, which ultimately is just the current behavior without this change.
Ah I would have expected indexing issues. If we detect a datetime
field in Python, and place this in the .valueas.date
field in the document, won't ES reject the document if it's not in a format that it believes to be a valid date
?
@strawgate thank you for the detailed response, I found it very helpful.
This has maintenance downsides -- we may find that python types don't strictly match elasticsearch types and have to make small adjustments to the coercion along the way but these should not block indexing and would just result in the user being left with just the value field for that particular document, which ultimately is just the current behavior without this change.
Ah I would have expected indexing issues. If we detect a
datetime
field in Python, and place this in the.valueas.date
field in the document, won't ES reject the document if it's not in a format that it believes to be a validdate
?
We would have python attempt to coerce the value into a datetime object and if that is successful we can force a particular output format of the datetime object to ensure that Elasticsearch will always accept it.
Something like this is working for me on my local branch:
# Create a datetime, date and time field if the string is a valid date
document_body["hass.entity"]["valueas"]["datetime"] = datetime.fromisoformat(_state).isoformat()
document_body["hass.entity"]["valueas"]["date"] = datetime.fromisoformat(_state).date().isoformat()
document_body["hass.entity"]["valueas"]["time"] = datetime.fromisoformat(_state).time().isoformat()
Index mapping:
},
"datetime": {
"type": "date"
},
"date": {
"type": "date",
"format": "strict_date"
},
"time": {
"type": "date",
"format": "HH:mm:ss.SSSSSS||time||strict_hour_minute_second||time_no_millis"
},
Sounds good. I'm happy to proceed with this approach for now. Thanks!
custom_components/elasticsearch/es_doc_creator.py:59:9: C901
state_to_document
is too complex (26 > 25) Found 2 errors (1 fixed, 1 remaining). Error: Process completed with exit code 1.
Feel free to add an ignore for this "issue" if that's simpler. I wasn't aware this lint rule was enabled, and I don't have a strong opinion on the complexity threshold at this point.
Coverage Report
File Stmts Miss Cover Missing __init__.py 0 0 100% elasticsearch __init__.py 78 2 97% 194–195 config_flow.py 221 16 92% 70, 103, 204, 274, 294, 306, 355–356, 360, 399, 405, 439, 508, 517, 561–562 const.py 24 0 100% entity_details.py 33 0 100% errors.py 25 2 92% 47, 60 es_doc_creator.py 77 2 97% 158, 169 es_doc_publisher.py 184 21 88% 87, 203–204, 220–221, 226, 231, 287, 293–295, 297, 311–312, 314–315, 317–321 es_gateway.py 100 20 80% 82–83, 98, 102, 118–120, 122, 124–125, 127–131, 133–137 es_index_manager.py 74 13 82% 101–102, 113–118, 126–127, 141–142, 162 es_integration.py 37 2 94% 43–44 es_privilege_check.py 50 0 100% es_serializer.py 10 0 100% es_version.py 20 0 100% logger.py 2 0 100% system_info.py 17 1 94% 23 utils.py 4 0 100% TOTAL 956 79 91%
Tests | Skipped | Failures | Errors | Time |
---|---|---|---|---|
46 | 0 :zzz: | 0 :x: | 0 :fire: | 4.029s :stopwatch: |
I'm super unfamiliar with Python tests so maybe I'll take a stab at it this week and if you get some time to add some via a PR I can use those as a launching point?
One thing that would really help me is if you could provide a starting example for each kind of test we will need and then I can copy/paste/run different scenarios.
Ok, added some tests for doc creator, doc publisher, and es_version. Will try to work on this more later this week.
I will also create a new pre-release build on my branch and make sure everything is still looking good on a real instance
A follow-up question, now that we have coercion with valueas do we still want to do float coercion on value? Right now value can be float or string, valueas can have boolean (as of a commit coming in a little bit), datetime, date, time, string, float.
My proposal:
In v1 mode, value can continue to be a string or float with values like on and off continuing to be coerced to float 0 and 1
In v2 mode, value is always a string and float, boolean, etc are coerced into valueas fields. If no coercions are successful valueas.string
will contain the state in addition to value
Also propose not populating the host.*
fields with "UNKNOWN" in datastream documents if the value isn't known
One thing that would really help me is if you could provide a starting example for each kind of test we will need and then I can copy/paste/run different scenarios.
It looks like you beat me to this. Let me know if there's a sample you'd like me to write yet.
A follow-up question, now that we have coercion with valueas do we still want to do float coercion on value? Right now value can be float or string, valueas can have boolean (as of a commit coming in a little bit), datetime, date, time, string, float.
My proposal: In v1 mode, value can continue to be a string or float with values like on and off continuing to be coerced to float 0 and 1 In v2 mode, value is always a string and float, boolean, etc are coerced into valueas fields. If no coercions are successful
valueas.string
will contain the state in addition tovalue
Your proposal makes sense to me.
Also propose not populating the host.* fields with "UNKNOWN" in datastream documents if the value isn't known
++
I broke the CI in an attempt to get the code coverage comment working. We won't be able to trust GH action output on this PR
I might have fixed this. If you pull the latest main
into this branch, then we'll know for sure. It should at least post the code coverage comment with the correct numbers. The drill-down links into covered/uncovered lines won't work because I'm not storing those assets anywhere, yet.
I broke the CI in an attempt to get the code coverage comment working. We won't be able to trust GH action output on this PR
I might have fixed this. If you pull the latest
main
into this branch, then we'll know for sure. It should at least post the code coverage comment with the correct numbers. The drill-down links into covered/uncovered lines won't work because I'm not storing those assets anywhere, yet.
Sounds good, I ran a code coverage report yesterday that showed code coverage at 92% matching main. I'll spend some more time today increasing code coverage.
Also, having code coverage in the pytest ini settings breaks debugging in vscode
Another edge-case:
In v1 we have hass.geo.location which is set to the hass server location unless an entity has a longitude and latitude property in which case it's overwritten with the entity attribute values.
Proposal 1 (My preference): hass.geo.location is always the hass server location. hass.entity.geo.location is the location from the entity's attributes if they exist and is the hass server location if they do not exist.
Proposal 2 (Alternative): hass.geo.location is always the hass server location. hass.entity.geo.location is the location from the entity's attributes only if they exist
Proposal 1 allows calculating/plotting distance between hass server and individual sensors if those sensors have latitude and longitude provided while placing all entities by default at the hass server location.
Another edge-case:
In v1 we have hass.geo.location which is set to the hass server location unless an entity has a longitude and latitude property in which case it's overwritten with the entity attribute values.
Proposal 1 (My preference): hass.geo.location is always the hass server location. hass.entity.geo.location is the location from the entity's attributes if they exist and is the hass server location if they do not exist.
Proposal 2 (Alternative): hass.geo.location is always the hass server location. hass.entity.geo.location is the location from the entity's attributes only if they exist
Proposal 1 allows calculating/plotting distance between hass server and individual sensors if those sensors have latitude and longitude provided while placing all entities by default at the hass server location.
Good find. I'm happy to move forward with Proposal 1
So actually i misread the existing code:
in v1 there is host.geo.location
which is always the hass server location and hass.geo.location
which is populated if the entity has geo location attributes on it. host
vs hass
in v2 there will be host.geo.location
which will always be the hass server location and hass.entity.geo.location
which will be populated if the entity has geo location attributes on it and will fallback to the value in host.geo.location
if not.
Coverage Report
File Stmts Miss Cover Missing __init__.py 0 0 100% elasticsearch __init__.py 84 2 97% 204–205 config_flow.py 258 18 93% 86, 129, 237, 307, 330, 368, 405, 457–458, 462, 502, 508, 518, 548, 623, 632, 676–677 const.py 31 0 100% entity_details.py 33 0 100% errors.py 25 2 92% 47, 60 es_doc_creator.py 169 0 100% es_doc_publisher.py 232 15 93% 97, 213–214, 230–231, 320, 410–411, 413–414, 416–420 es_gateway.py 100 20 80% 82–83, 98, 102, 118–120, 122, 124–125, 127–131, 133–137 es_index_manager.py 108 11 89% 195–200, 208–209, 223–224, 244 es_integration.py 37 2 94% 43–44 es_privilege_check.py 50 0 100% es_serializer.py 10 0 100% es_version.py 22 0 100% logger.py 2 0 100% system_info.py 17 1 94% 23 utils.py 4 0 100% TOTAL 1182 71 93%
Tests | Skipped | Failures | Errors | Time |
---|---|---|---|---|
92 | 0 :zzz: | 0 :x: | 0 :fire: | 7.014s :stopwatch: |
Created a pre-release build and running it against my serverless instance shows it working!
Will try to find some time to walk through a legacy setup and a legacy -> datastream conversion
Ran a quick test with legacy indexing which worked great. Just for my own comfort I setup an integration with legacy indexing, ingested the same entity using mainline and using this PR, and did a JSON Diff and confirmed no change in legacy document format.
The integration currently relies on legacy index templates and rollover indices. Ideally, users would default to the latest and greatest for metrics storage in Elastic. As it stands, the integration is currently non functional on serverless.
This PR adds the ability to (as a default for new users), store events as metrics in elasticsearch in a time series datastream while retaining the ability to send data to legacy indices for compatibility.
It also implements a number of other desired capabilities like domain-specific datastreams while ensuring that users can query and visualize over both legacy indices and modern data streams with a single data view. It also adds support for ES Serverless.
This doesn't prevent moving to an integration package later if desirable.
Open Questions:
Tasks: