home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
72.98k stars 30.53k forks source link

Recorder: excessive amount of aborted connections #5498

Closed fanaticDavid closed 7 years ago

fanaticDavid commented 7 years ago

Make sure you are running the latest version of Home Assistant before reporting an issue.

You should only file an issue if you found a bug. Feature and enhancement requests should go in the Feature Requests section of our community forum:

Home Assistant release (hass --version): 0.36.1 Previously tested with 0.35.x but got similar results Never tested with 0.34.x or older

Python release (python3 --version): 3.4.2

Component/platform: Recorder, using a MariaDB instance on a Synology NAS Home Assistant is running on a Raspberry Pi 3

Description of problem: Back in July 2016 or so, I configured the recorder to save data in a MySQL (MariaDB) instance running on my Synology NAS. Originally, that switch did show an improvement in performance, for instance when clicking an entity in the Home Asssistant UI to pull up the history graph: it would take 4-5 seconds, rather than 9-10 seconds like it did before. Since then, performance seems to have returned to where I started so I decided to try and find the cause. In my research, I found a tool called MySQLTuner. Running that tool returns a variety of metrics and recommendations for both performance and security. The latest run gave me the following peformance metrics:

[--] Up for: 14d 14h 36m 51s (3M q [2.545 qps], 104K conn, TX: 519M, RX: 660M)
[--] Reads / Writes: 33% / 67%
[--] Binary logging is disabled
[--] Physical Memory     : 1.0G
[--] Max MySQL memory    : 3.8G
[--] Other process memory: 406.6M
[--] Total buffers: 186.0M global + 24.9M per thread (151 max threads)
[--] P_S Max memory usage: 0B
[--] Galera GCache Max memory usage: 0B
[OK] Maximum reached memory usage: 335.2M (32.73% of installed RAM)
[!!] Maximum possible memory usage: 3.8G (384.74% of installed RAM)
[!!] Overall possible memory usage with other process exceeded memory
[OK] Slow queries: 0% (21/3M)
[OK] Highest usage of available connections: 3% (6/151)
[!!] Aborted connections: 97.44%  (102245/104930)
[!!] Query cache is disabled
[OK] Sorts requiring temporary tables: 0% (0 temp sorts / 84 sorts)
[OK] No joins without indexes
[OK] Temporary tables created on disk: 17% (86 on disk / 486 total)
[!!] Thread cache is disabled
[!!] Table cache hit rate: 0% (4 open / 1K opened)
[OK] Open file limit used: 0% (4/1K)
[OK] Table locks acquired immediately: 99% (1M immediate / 1M locks)

The excessive amount of aborted connections caught my eye. It may not be the (full) cause of the drop in performance, but it is definitely far from ideal. MySQLTuner also included the following recommendation, based on the amount of aborted connections:

Reduce or eliminate unclosed connections and network issues

Expected: The recorder should properly close all opened connections

Problem-relevant configuration.yaml entries and steps to reproduce:

recorder:
  purge_days: 14
  db_url: mysql://REDACTED:REDACTED@xxx.xxx.xxx.x/HASS_db

I issue the following command to run MySQLTuner on my Synology DS412+:

user@NAS:/usr/local/bin$ sudo perl mysqltuner.pl --user REDACTED --pass REDACTED --forcemem 1024 --forceswap 2048

Additional info:

kellerza commented 7 years ago

The previous commit should have improved this and 0.40 will include a big rewrite of session handling in recorder

fanaticDavid commented 7 years ago

I'm still seeing the excessive amount of aborted connections after running 0.40 for over 2 days. To start with a clean slate, I did reboot the NAS (host of the MariaDB instance) and the Raspberry Pi 3 (host of Home Assistant) after upgrading to 0.40. Here are the current metrics:

-------- Performance Metrics -----------------------------------------------------------------------
[--] Up for: 2d 10h 34m 51s (200K q [0.951 qps], 16K conn, TX: 7M, RX: 61M)
[--] Reads / Writes: 0% / 100%
[--] Binary logging is disabled
[--] Physical Memory     : 1.0G
[--] Max MySQL memory    : 3.8G
[--] Other process memory: 417.0M
[--] Total buffers: 186.0M global + 24.9M per thread (151 max threads)
[--] P_S Max memory usage: 0B
[--] Galera GCache Max memory usage: 0B
[OK] Maximum reached memory usage: 260.6M (25.45% of installed RAM)
[!!] Maximum possible memory usage: 3.8G (384.74% of installed RAM)
[!!] Overall possible memory usage with other process exceeded memory
[OK] Slow queries: 0% (2/200K)
[OK] Highest usage of available connections: 1% (3/151)
[!!] Aborted connections: 99.95%  (16431/16440)
[!!] Query cache is disabled
[OK] Sorts requiring temporary tables: 0% (0 temp sorts / 12 sorts)
[OK] No joins without indexes
[OK] Temporary tables created on disk: 18% (4 on disk / 22 total)
[!!] Thread cache is disabled
[OK] Table cache hit rate: 33% (4 open / 12 opened)
[OK] Open file limit used: 0% (3/1K)
[OK] Table locks acquired immediately: 100% (100K immediate / 100K locks)

Please reopen this issue. Thanks.

fanaticDavid commented 7 years ago

Once more, I request for this issue to be reopened. The amount of aborted connections is still mindblowing:

-------- Performance Metrics -----------------------------------------------------------------------
[--] Up for: 6d 22h 51m 45s (626K q [1.043 qps], 47K conn, TX: 52M, RX: 321M)
[--] Reads / Writes: 0% / 100%
[--] Binary logging is disabled
[--] Physical Memory     : 988.7M
[--] Max MySQL memory    : 601.8M
[--] Other process memory: 289.2M
[--] Total buffers: 170.0M global + 2.9M per thread (151 max threads)
[--] P_S Max memory usage: 0B
[--] Galera GCache Max memory usage: 0B
[OK] Maximum reached memory usage: 181.5M (18.35% of installed RAM)
[OK] Maximum possible memory usage: 601.8M (60.87% of installed RAM)
[OK] Overall possible memory usage with other process is compatible with memory available
[OK] Slow queries: 0% (14/626K)
[OK] Highest usage of available connections: 2% (4/151)
[!!] Aborted connections: 98.53%  (46316/47005)
[!!] Query cache may be disabled by default due to mutex contention.
[!!] Query cache efficiency: 0.0% (0 cached / 1K selects)
[OK] Query cache prunes per day: 0
[OK] Sorts requiring temporary tables: 5% (3 temp sorts / 57 sorts)
[OK] No joins without indexes
[OK] Temporary tables created on disk: 6% (14 on disk / 203 total)
[!!] Thread cache is disabled
[!!] Table cache hit rate: 0% (4 open / 416 opened)
[OK] Open file limit used: 0% (5/1K)
[OK] Table locks acquired immediately: 100% (310K immediate / 310K locks)

In addition, I originally had an issue with errors in the Home Assistant log after switching from SQLite to MariaDB a year ago. With some research, I applied a workaround. I recently switched from MariaDB 5 to MariaDB 10, meaning the workaround is no longer in place, and the aforementioned errors are back:

2017-07-07 21:04:52 ERROR (Thread-9) [homeassistant.components.recorder.util] Error executing query: (_mysql_exceptions.OperationalError) (1129, "Host '192.168.xxx.x' is blocked because of many connection errors; unblock with 'mysqladmin flush-hosts'")
2017-07-07 21:04:52 ERROR (Thread-9) [homeassistant.components.recorder.util] Error executing query: (_mysql_exceptions.OperationalError) (1129, "Host '192.168.xxx.x' is blocked because of many connection errors; unblock with 'mysqladmin flush-hosts'")
2017-07-07 21:04:52 ERROR (Thread-9) [homeassistant.components.recorder.util] Error executing query: (_mysql_exceptions.OperationalError) (1129, "Host '192.168.xxx.x' is blocked because of many connection errors; unblock with 'mysqladmin flush-hosts'")
2017-07-07 21:04:52 ERROR (Thread-9) [homeassistant.components.recorder.util] Error executing query: (_mysql_exceptions.OperationalError) (1129, "Host '192.168.xxx.x' is blocked because of many connection errors; unblock with 'mysqladmin flush-hosts'")
2017-07-07 21:04:52 ERROR (MainThread) [aiohttp.server] Error handling request
Traceback (most recent call last):
  File "/home/hass/.homeassistant/deps/sqlalchemy/pool.py", line 1122, in _do_get
    return self._pool.get(wait, self._timeout)
  File "/home/hass/.homeassistant/deps/sqlalchemy/util/queue.py", line 145, in get
    raise Empty
sqlalchemy.util.queue.Empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/hass/.homeassistant/deps/sqlalchemy/engine/base.py", line 2147, in _wrap_pool_connect
    return fn()
  File "/home/hass/.homeassistant/deps/sqlalchemy/pool.py", line 387, in connect
    return _ConnectionFairy._checkout(self)
  File "/home/hass/.homeassistant/deps/sqlalchemy/pool.py", line 766, in _checkout
    fairy = _ConnectionRecord.checkout(pool)
  File "/home/hass/.homeassistant/deps/sqlalchemy/pool.py", line 516, in checkout
    rec = pool._do_get()
  File "/home/hass/.homeassistant/deps/sqlalchemy/pool.py", line 1138, in _do_get
    self._dec_overflow()
  File "/home/hass/.homeassistant/deps/sqlalchemy/util/langhelpers.py", line 66, in __exit__
    compat.reraise(exc_type, exc_value, exc_tb)
  File "/home/hass/.homeassistant/deps/sqlalchemy/util/compat.py", line 187, in reraise
    raise value
  File "/home/hass/.homeassistant/deps/sqlalchemy/pool.py", line 1135, in _do_get
    return self._create_connection()
  File "/home/hass/.homeassistant/deps/sqlalchemy/pool.py", line 333, in _create_connection
    return _ConnectionRecord(self)
  File "/home/hass/.homeassistant/deps/sqlalchemy/pool.py", line 461, in __init__
    self.__connect(first_connect_check=True)
  File "/home/hass/.homeassistant/deps/sqlalchemy/pool.py", line 651, in __connect
    connection = pool._invoke_creator(self)
  File "/home/hass/.homeassistant/deps/sqlalchemy/engine/strategies.py", line 105, in connect
    return dialect.connect(*cargs, **cparams)
  File "/home/hass/.homeassistant/deps/sqlalchemy/engine/default.py", line 393, in connect
    return self.dbapi.connect(*cargs, **cparams)
  File "/srv/hass/hass_venv/lib/python3.4/site-packages/MySQLdb/__init__.py", line 81, in Connect
    return Connection(*args, **kwargs)
  File "/srv/hass/hass_venv/lib/python3.4/site-packages/MySQLdb/connections.py", line 204, in __init__
    super(Connection, self).__init__(*args, **kwargs2)
_mysql_exceptions.OperationalError: (1129, "Host '192.168.xxx.x' is blocked because of many connection errors; unblock with 'mysqladmin flush-hosts'")

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/srv/hass/hass_venv/lib/python3.4/site-packages/aiohttp/web_protocol.py", line 422, in start
    resp = yield from self._request_handler(request)
  File "/srv/hass/hass_venv/lib/python3.4/site-packages/aiohttp/web.py", line 306, in _handle
    resp = yield from handler(request)
  File "/usr/lib/python3.4/asyncio/coroutines.py", line 143, in coro
    res = yield from res
  File "/usr/lib/python3.4/asyncio/coroutines.py", line 143, in coro
    res = yield from res
  File "/srv/hass/hass_venv/lib/python3.4/site-packages/homeassistant/components/http/ban.py", line 58, in ban_middleware_handler
    return (yield from handler(request))
  File "/srv/hass/hass_venv/lib/python3.4/site-packages/homeassistant/components/http/__init__.py", line 424, in handle
    result = yield from result
  File "/srv/hass/hass_venv/lib/python3.4/site-packages/homeassistant/components/history.py", line 238, in get
    entity_id, self.filters)
  File "/usr/lib/python3.4/asyncio/futures.py", line 388, in __iter__
    yield self  # This tells Task to wait for completion.
  File "/usr/lib/python3.4/asyncio/tasks.py", line 286, in _wakeup
    value = future.result()
  File "/usr/lib/python3.4/asyncio/futures.py", line 277, in result
    raise self._exception
  File "/usr/lib/python3.4/concurrent/futures/thread.py", line 54, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/srv/hass/hass_venv/lib/python3.4/site-packages/homeassistant/components/history.py", line 78, in get_significant_states
    query.order_by(States.entity_id, States.last_updated))
  File "/srv/hass/hass_venv/lib/python3.4/site-packages/homeassistant/components/recorder/util.py", line 63, in execute
    (row.to_native() for row in qry)
  File "/home/hass/.homeassistant/deps/sqlalchemy/orm/query.py", line 2855, in __iter__
    return self._execute_and_instances(context)
  File "/home/hass/.homeassistant/deps/sqlalchemy/orm/query.py", line 2876, in _execute_and_instances
    close_with_result=True)
  File "/home/hass/.homeassistant/deps/sqlalchemy/orm/query.py", line 2885, in _get_bind_args
    **kw
  File "/home/hass/.homeassistant/deps/sqlalchemy/orm/query.py", line 2867, in _connection_from_session
    conn = self.session.connection(**kw)
  File "/home/hass/.homeassistant/deps/sqlalchemy/orm/session.py", line 998, in connection
    execution_options=execution_options)
  File "/home/hass/.homeassistant/deps/sqlalchemy/orm/session.py", line 1003, in _connection_for_bind
    engine, execution_options)
  File "/home/hass/.homeassistant/deps/sqlalchemy/orm/session.py", line 403, in _connection_for_bind
    conn = bind.contextual_connect()
  File "/home/hass/.homeassistant/deps/sqlalchemy/engine/base.py", line 2112, in contextual_connect
    self._wrap_pool_connect(self.pool.connect, None),
  File "/home/hass/.homeassistant/deps/sqlalchemy/engine/base.py", line 2151, in _wrap_pool_connect
    e, dialect, self)
  File "/home/hass/.homeassistant/deps/sqlalchemy/engine/base.py", line 1465, in _handle_dbapi_exception_noconnection
    exc_info
  File "/home/hass/.homeassistant/deps/sqlalchemy/util/compat.py", line 203, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "/home/hass/.homeassistant/deps/sqlalchemy/util/compat.py", line 186, in reraise
    raise value.with_traceback(tb)
  File "/home/hass/.homeassistant/deps/sqlalchemy/engine/base.py", line 2147, in _wrap_pool_connect
    return fn()
  File "/home/hass/.homeassistant/deps/sqlalchemy/pool.py", line 387, in connect
    return _ConnectionFairy._checkout(self)
  File "/home/hass/.homeassistant/deps/sqlalchemy/pool.py", line 766, in _checkout
    fairy = _ConnectionRecord.checkout(pool)
  File "/home/hass/.homeassistant/deps/sqlalchemy/pool.py", line 516, in checkout
    rec = pool._do_get()
  File "/home/hass/.homeassistant/deps/sqlalchemy/pool.py", line 1138, in _do_get
    self._dec_overflow()
  File "/home/hass/.homeassistant/deps/sqlalchemy/util/langhelpers.py", line 66, in __exit__
    compat.reraise(exc_type, exc_value, exc_tb)
  File "/home/hass/.homeassistant/deps/sqlalchemy/util/compat.py", line 187, in reraise
    raise value
  File "/home/hass/.homeassistant/deps/sqlalchemy/pool.py", line 1135, in _do_get
    return self._create_connection()
  File "/home/hass/.homeassistant/deps/sqlalchemy/pool.py", line 333, in _create_connection
    return _ConnectionRecord(self)
  File "/home/hass/.homeassistant/deps/sqlalchemy/pool.py", line 461, in __init__
    self.__connect(first_connect_check=True)
  File "/home/hass/.homeassistant/deps/sqlalchemy/pool.py", line 651, in __connect
    connection = pool._invoke_creator(self)
  File "/home/hass/.homeassistant/deps/sqlalchemy/engine/strategies.py", line 105, in connect
    return dialect.connect(*cargs, **cparams)
  File "/home/hass/.homeassistant/deps/sqlalchemy/engine/default.py", line 393, in connect
    return self.dbapi.connect(*cargs, **cparams)
  File "/srv/hass/hass_venv/lib/python3.4/site-packages/MySQLdb/__init__.py", line 81, in Connect
    return Connection(*args, **kwargs)
  File "/srv/hass/hass_venv/lib/python3.4/site-packages/MySQLdb/connections.py", line 204, in __init__
    super(Connection, self).__init__(*args, **kwargs2)
sqlalchemy.exc.OperationalError: (_mysql_exceptions.OperationalError) (1129, "Host '192.168.xxx.x' is blocked because of many connection errors; unblock with 'mysqladmin flush-hosts'")

The workaround, which I will now have to reapply, is to increase MariaDB's system variable max_connect_errors from the default value (100) to 1000000 (which is probably excessive, but it works).

aserramonner commented 7 years ago

Same issue seen with 0.48.0 and MariaDB:

2017-07-06 21:18:46 ERROR (Thread-11) [homeassistant.components.recorder.util] Error executing query: (_mysql_exceptions.OperationalError) (1129, "Host 'xx' is blocked because of many connection errors; unblock with 'mysqladmin flush-hosts'") File "/srv/homeassistant/lib/python3.4/site-packages/MySQLdb/init.py", line 86, in Connect File "/srv/homeassistant/lib/python3.4/site-packages/MySQLdb/connections.py", line 204, in init _mysql_exceptions.OperationalError: (1129, "Host 'xx' is blocked because of many connection errors; unblock with mysqladmin flush-hosts'") File "/srv/homeassistant/lib/python3.4/site-packages/MySQLdb/init.py", line 86, in Connect File "/srv/homeassistant/lib/python3.4/site-packages/MySQLdb/connections.py", line 204, in init sqlalchemy.exc.OperationalError: (_mysql_exceptions.OperationalError) (1129, "Host 'xx' is blocked because of many connection errors; unblock with 'mysqladmin flush-hosts'")

fanaticDavid commented 7 years ago

I don't know what changed, let alone when, but the amount of aborted connections has dropped to...almost none!

-------- Performance Metrics -----------------------------------------------------------------------
[--] Up for: 5d 10h 55m 26s (495K q [1.051 qps], 1K conn, TX: 47M, RX: 257M)
[--] Reads / Writes: 1% / 99%
[--] Binary logging is disabled
[--] Physical Memory     : 988.7M
[--] Max MySQL memory    : 720.8M
[--] Other process memory: 297.4M
[--] Total buffers: 254.0M global + 2.9M per thread (151 max threads)
[--] P_S Max memory usage: 35M
[--] Galera GCache Max memory usage: 0B
[OK] Maximum reached memory usage: 303.3M (30.68% of installed RAM)
[OK] Maximum possible memory usage: 720.8M (72.91% of installed RAM)
[!!] Overall possible memory usage with other process exceeded memory
[OK] Slow queries: 0% (9/495K)
[OK] Highest usage of available connections: 3% (5/151)
[OK] Aborted connections: 0.15%  (3/1996)
[!!] Query cache may be disabled by default due to mutex contention.
[!!] Query cache efficiency: 0.0% (0 cached / 4K selects)
[OK] Query cache prunes per day: 0
[OK] Sorts requiring temporary tables: 1% (1 temp sorts / 51 sorts)
[OK] No joins without indexes
[OK] Temporary tables created on disk: 6% (20 on disk / 297 total)
[OK] Thread cache hit rate: 99% (5 created / 1K connections)
[!!] Table cache hit rate: 0% (4 open / 1K opened)
[OK] Open file limit used: 0% (5/1K)
[OK] Table locks acquired immediately: 100% (244K immediate / 244K locks)

If this keeps up, I'll consider undoing the workaround I talked about in my previous comment to see if the errors @aserramonner and myself had come back.

EDIT: come to think of it, I may have fixed the issue myself by setting innodb_buffer_pool_size to 96M (the default was only 16M) in MariaDB's my.cnf file. I have been making tweaks recently to try and improve performance for the recorder, as I'm dealing with fairly limited resources on my NAS.