legrego / homeassistant-elasticsearch

Publish Home-Assistant events to Elasticsearch
https://legrego.github.io/homeassistant-elasticsearch/
MIT License
145 stars 38 forks source link

Connection timeout errors #107

Closed jsermer closed 3 years ago

jsermer commented 3 years ago

Environment Home-Assistant version: 0.112.3 Elasticsearch version: 7.8.0

Relevant configuration.yml settings:

elastic:
  url: http://10.X.X.X:9200
  only_publish_changed: true
  ilm_max_size: 1gb
  ilm_delete_after: 90d

Describe the bug I receive connection timeout messages most of the time, but occasionally, the integration works and logs data into elasticsearch properly.

Expected behavior No connection timeouts

I started seeing this issue when https://github.com/legrego/homeassistant-elasticsearch/issues/97 was implemented/resolved.

Additional context I am able to curl the url endpoint from within the homeassistant container:

# docker exec -it homeassistant curl http://10.X.X.X:9200
{
  "name" : "elasticsearch",
  "cluster_name" : "synology",
  "cluster_uuid" : "h4bS4pFUSfePLhUCySAaOw",
  "version" : {
    "number" : "7.8.0",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "757314695644ea9a1dc2fecd26d1a43856725e65",
    "build_date" : "2020-06-14T19:35:50.234439Z",
    "build_snapshot" : false,
    "lucene_version" : "8.5.1",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

Log messages:

2020-07-07 15:12:34 ERROR (MainThread) [homeassistant.setup] Error during setup of component elastic
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/aiohttp/client.py", line 483, in _request
    timeout=real_timeout
  File "/usr/local/lib/python3.7/site-packages/aiohttp/connector.py", line 523, in connect
    proto = await self._create_connection(req, traces, timeout)
  File "/usr/local/lib/python3.7/site-packages/aiohttp/connector.py", line 859, in _create_connection
    req, traces, timeout)
  File "/usr/local/lib/python3.7/site-packages/aiohttp/connector.py", line 986, in _create_direct_connection
    req=req, client_error=client_error)
  File "/usr/local/lib/python3.7/site-packages/aiohttp/connector.py", line 936, in _wrap_create_connection
    return await self._loop.create_connection(*args, **kwargs)  # type: ignore  # noqa
  File "/usr/local/lib/python3.7/asyncio/base_events.py", line 989, in create_connection
    ssl_handshake_timeout=ssl_handshake_timeout)
  File "/usr/local/lib/python3.7/asyncio/base_events.py", line 1017, in _create_connection_transport
    await waiter
concurrent.futures._base.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/aiohttp/client.py", line 483, in _request
    timeout=real_timeout
  File "/usr/local/lib/python3.7/site-packages/async_timeout/__init__.py", line 45, in __exit__
    self._do_exit(exc_type)
  File "/usr/local/lib/python3.7/site-packages/async_timeout/__init__.py", line 92, in _do_exit
    raise asyncio.TimeoutError
concurrent.futures._base.TimeoutError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/elasticsearch_async/connection.py", line 98, in perform_request
    response = yield from self.session.request(method, url, data=body, headers=headers)
  File "/usr/local/lib/python3.7/site-packages/aiohttp/client.py", line 488, in _request
    'to host {0}'.format(url)) from exc
aiohttp.client_exceptions.ServerTimeoutError: Connection timeout to host http://10.X.X.X:9200/

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/setup.py", line 193, in _async_setup_component
    result = await asyncio.wait_for(task, SLOW_SETUP_MAX_WAIT)
  File "/usr/local/lib/python3.7/asyncio/tasks.py", line 442, in wait_for
    return fut.result()
  File "/config/custom_components/elastic/__init__.py", line 74, in async_setup
    await gateway.async_init()
  File "/config/custom_components/elastic/es_gateway.py", line 36, in async_init
    await self.es_version.async_init()
  File "/config/custom_components/elastic/es_version.py", line 15, in async_init
    version = (await self._client.info())["version"]
  File "/usr/local/lib/python3.7/site-packages/elasticsearch_async/transport.py", line 150, in main_loop
    method, url, params, body, headers=headers, ignore=ignore, timeout=timeout)
  File "/usr/local/lib/python3.7/site-packages/elasticsearch_async/connection.py", line 110, in perform_request
    raise ConnectionTimeout('TIMEOUT', str(e), e)
jsermer commented 3 years ago

Manually bumping the elasticsearch requirement in the manifest.json and restart ha (ha core restart) seems to have fixed my issue:

# docker exec -it homeassistant pip list|grep elastic
elasticsearch                    7.8.0
elasticsearch-async              6.2.0
jsermer commented 3 years ago

I performed a couple more restarts and was able to recreate the initial problem, so bumping the requirements wasn't a real solution. :(

jsermer commented 3 years ago

The elasticsearch node is on the same switch, just on another VLAN and the timings aren't slow by any means:

>>> from datetime import datetime
>>> import time
>>> from elasticsearch import Elasticsearch
>>> es = Elasticsearch("http://10.X.X.X:9200")
>>> start = time.time(); es.info(); end = time.time(); print(end - start)
{'name': 'elasticsearch', 'cluster_name': 'synology', 'cluster_uuid': 'h4bS4pFUSfePLhUCySAaOw', 'version': {'number': '7.8.0', 'build_flavor': 'default', 'build_type': 'docker', 'build_hash': '757314695644ea9a1dc2fecd26d1a43856725e65', 'build_date': '2020-06-14T19:35:50.234439Z', 'build_snapshot': False, 'lucene_version': '8.5.1', 'minimum_wire_compatibility_version': '6.8.0', 'minimum_index_compatibility_version': '6.0.0-beta1'}, 'tagline': 'You Know, for Search'}
0.01810288429260254
jsermer commented 3 years ago

Elastic recently announced async is now native to the elasticsearch and the elasticsearch-async has been deprecated:

https://elasticsearch-py.readthedocs.io/en/master/async.html https://github.com/elastic/elasticsearch-py-async#python-elasticsearch-async-client

The only caveat being a prereq of ES7

jsermer commented 3 years ago

Using the elasticsearch-async library as part of the custom component (instead of the elasticsearch library), yields similar results on my raspberry pi:

>>> from datetime import datetime
>>> import time
>>> import asyncio
>>> from elasticsearch_async import AsyncElasticsearch
>>> client = AsyncElasticsearch(hosts=['http://10.X.X.X:9200'])
>>> async def print_info():
...     info = await client.info()
...     print(info)
...
>>> loop = asyncio.get_event_loop()
>>> start = time.time(); loop.run_until_complete(print_info()); end = time.time(); print(end - start)
{'name': 'elasticsearch', 'cluster_name': 'synology', 'cluster_uuid': 'h4bS4pFUSfePLhUCySAaOw', 'version': {'number': '7.8.0', 'build_flavor': 'default', 'build_type': 'docker', 'build_hash': '757314695644ea9a1dc2fecd26d1a43856725e65', 'build_date': '2020-06-14T19:35:50.234439Z', 'build_snapshot': False, 'lucene_version': '8.5.1', 'minimum_wire_compatibility_version': '6.8.0', 'minimum_index_compatibility_version': '6.0.0-beta1'}, 'tagline': 'You Know, for Search'}
0.022830724716186523
>>> loop.run_until_complete(client.transport.close())
>>> loop.close()
jsermer commented 3 years ago

Things seem to be slowly improving through the 0.113.X release cycle (currently on the 0.113.3 release). Might be because they continue to fix blocking tasks during startup?

I've noticed that upon an upgrade restart, the elastic integration works seemingly all of the time, but non-upgrade restarts seem to have trouble. That particular symptom seems to be improving. Recent log entry after non-upgrade restart (still see some timeout errors logged, but the elastic integration appears to be working):

2020-08-06 09:00:08 WARNING (MainThread) [homeassistant.loader] You are using a custom integration for elastic which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant.
2020-08-06 09:00:28 WARNING (MainThread) [homeassistant.setup] Setup of elastic is taking over 10 seconds.
2020-08-06 09:00:56 WARNING (MainThread) [elasticsearch] PUT http://10.X.X.X:9200/active-hass-index-v4_1/_settings?preserve_existing=true [status:N/A request:12.050s]
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/elasticsearch_async/connection.py", line 98, in perform_request
    response = yield from self.session.request(method, url, data=body, headers=headers)
  File "/usr/local/lib/python3.8/site-packages/aiohttp/client.py", line 504, in _request
    await resp.start(conn)
  File "/usr/local/lib/python3.8/site-packages/aiohttp/client_reqrep.py", line 847, in start
    message, payload = await self._protocol.read()  # type: ignore  # noqa
  File "/usr/local/lib/python3.8/site-packages/aiohttp/streams.py", line 591, in read
    await self._waiter
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/elasticsearch_async/connection.py", line 99, in perform_request
    raw_data = yield from response.text()
  File "/usr/local/lib/python3.8/site-packages/async_timeout/__init__.py", line 45, in __exit__
    self._do_exit(exc_type)
  File "/usr/local/lib/python3.8/site-packages/async_timeout/__init__.py", line 92, in _do_exit
    raise asyncio.TimeoutError
asyncio.exceptions.TimeoutError
2020-08-06 09:00:56 ERROR (MainThread) [custom_components.elastic] Error updating index ILM settings: ConnectionTimeout caused by - TimeoutError()
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/elasticsearch_async/connection.py", line 98, in perform_request
    response = yield from self.session.request(method, url, data=body, headers=headers)
  File "/usr/local/lib/python3.8/site-packages/aiohttp/client.py", line 504, in _request
    await resp.start(conn)
  File "/usr/local/lib/python3.8/site-packages/aiohttp/client_reqrep.py", line 847, in start
    message, payload = await self._protocol.read()  # type: ignore  # noqa
  File "/usr/local/lib/python3.8/site-packages/aiohttp/streams.py", line 591, in read
    await self._waiter
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/elasticsearch_async/connection.py", line 99, in perform_request
    raw_data = yield from response.text()
  File "/usr/local/lib/python3.8/site-packages/async_timeout/__init__.py", line 45, in __exit__
    self._do_exit(exc_type)
  File "/usr/local/lib/python3.8/site-packages/async_timeout/__init__.py", line 92, in _do_exit
    raise asyncio.TimeoutError
asyncio.exceptions.TimeoutError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/config/custom_components/elastic/es_index_manager.py", line 121, in _create_index_template
    await client.indices.put_settings(
  File "/usr/local/lib/python3.8/site-packages/elasticsearch_async/transport.py", line 149, in main_loop
    status, headers, data = yield from connection.perform_request(
  File "/usr/local/lib/python3.8/site-packages/elasticsearch_async/connection.py", line 110, in perform_request
    raise ConnectionTimeout('TIMEOUT', str(e), e)
elasticsearch.exceptions.ConnectionTimeout: ConnectionTimeout caused by - TimeoutError()
legrego commented 3 years ago

@jsermer Thanks for taking the time to document your findings here. Sorry for not responding to you sooner. I haven't had much time recently to triage issues, but I'm hoping to get to this in the next couple of weeks

jsermer commented 3 years ago

No problem at all....having the elastic integration is a 'nice to have' and really doesn't impact my home automation in any way. Let me know how I can help and what other information you might require. Thanks for responding!

legrego commented 3 years ago

@jsermer I know you said the package upgrades didn't actually solves your problem, but I have a pre-release version of 0.3.0 published here if you're able to check it out. I'm interested to see if you get a different stack trace.

This version also adds a timeout setting, which might help your particular setup

jsermer commented 3 years ago

@legrego Thanks, I updated to the latest beta version this morning via HACS but did not add a timeout parameter to the config. So far it's working as expected (but has been working pretty consistently throughout the 0.114 release). If I begin seeing timeouts, I'll add that config parameter and continue to test. Nice work!

legrego commented 3 years ago

@jsermer that's great, thanks for the quick feedback! Full disclosure, the default timeout in the ES client was 10 seconds, but the work I did in #114 defaults the timeout to 30 seconds if you don't define one yourself, so that extra time might be helping too.

jsermer commented 3 years ago

@legrego very cool....it really only seemed to timeout upon startup, so even bumping that from 10 -> 30 makes a lot of sense. home assistant is trying to start a bunch of other things in parallel and that may have been causing the original timeout. Feel free to close this bug report and I'll re-open it if the new timeout mechanism doesn't satisfy my need.