home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
73.44k stars 30.68k forks source link

Suspected memory problem with modbus integration #84715

Closed ankerstal closed 1 year ago

ankerstal commented 1 year ago

The problem

For the last two months or something my homeassitant has started to restart on a semi regular basis. It seems to be some sort of memory problem. Se graph below: image

Im using a RPI 4 with 8GB memory and the standard docker container. (homeassistant/raspberrypi4-homeassistant:stable) and right now the running version is 2022.12.8.

First I thought the problem was the onvif integration but after disable all instances of onvif the problem persists. Find logs below.

What version of Home Assistant Core has the issue?

2022.12.8

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant Container

Integration causing the issue

modnus

Link to integration documentation on our website

https://www.home-assistant.io/integrations/modbus/

Diagnostics information

No response

Example YAML snippet

- name: ftx
    type: tcp
    host: 172.22.0.47
    port: 502
    climates:
      - name: "FTX Ventilation"
        address: 25
        input_type: holding
        slave: 2
        count: 1
        min_temp: 7
        target_temp_register: 49
        scale: 0.1

    binary_sensors:
      - name: Brandlarm (FTX)
        address: 94
        input_type: coil
        slave: 2
      - name: Fel tilluftssensor (FTX)
        address: 95
        input_type: coil
        slave: 2
      - name: Fel uteluftsensor (FTX)
        address: 96
        input_type: coil
        slave: 2
      - name: Fel frånluftssensor (FTX)
        address: 97
        input_type: coil
        slave: 2
      - name: Fel värmeväxlare (FTX)
        address: 111
        input_type: coil
        slave: 2

Anything in the logs that might be useful for us?

2022-12-28 18:11:04.924 ERROR (MainThread) [homeassistant] Error doing job: Task exception was never retrieved
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/modbus/sensor.py", line 95, in async_update
    raw_result = await self._hub.async_pymodbus_call(
  File "/usr/src/homeassistant/homeassistant/components/modbus/modbus.py", line 409, in async_pymodbus_call
    result = await self.hass.async_add_executor_job(
  File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/src/homeassistant/homeassistant/components/modbus/modbus.py", line 386, in _pymodbus_call
    result = entry.func(address, value, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/pymodbus/client/common.py", line 114, in read_holding_registers
    return self.execute(request)
  File "/usr/local/lib/python3.10/site-packages/pymodbus/client/sync.py", line 109, in execute
    return self.transaction.execute(request)
  File "/usr/local/lib/python3.10/site-packages/pymodbus/transaction.py", line 174, in execute
    response, last_exception = self._transact(
  File "/usr/local/lib/python3.10/site-packages/pymodbus/transaction.py", line 294, in _transact
    result = self._recv(response_length, full)
  File "/usr/local/lib/python3.10/site-packages/pymodbus/transaction.py", line 326, in _recv
    read_min = self.client.framer.recvPacket(min_size)
  File "/usr/local/lib/python3.10/site-packages/pymodbus/framer/__init__.py", line 49, in recvPacket
    return self.client.recv(size)
  File "/usr/local/lib/python3.10/site-packages/pymodbus/client/sync.py", line 89, in recv
    return self._recv(size)
  File "/usr/local/lib/python3.10/site-packages/pymodbus/client/sync.py", line 293, in _recv
    ready = select.select([self.socket], [], [], end - time_)
ValueError: filedescriptor out of range in select()
2022-12-28 18:11:04.930 ERROR (MainThread) [homeassistant] Error doing job: Task exception was never retrieved
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/modbus/sensor.py", line 95, in async_update
    raw_result = await self._hub.async_pymodbus_call(
  File "/usr/src/homeassistant/homeassistant/components/modbus/modbus.py", line 409, in async_pymodbus_call
    result = await self.hass.async_add_executor_job(
  File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/src/homeassistant/homeassistant/components/modbus/modbus.py", line 386, in _pymodbus_call
    result = entry.func(address, value, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/pymodbus/client/common.py", line 114, in read_holding_registers
    return self.execute(request)
  File "/usr/local/lib/python3.10/site-packages/pymodbus/client/sync.py", line 109, in execute
    return self.transaction.execute(request)
  File "/usr/local/lib/python3.10/site-packages/pymodbus/transaction.py", line 174, in execute
    response, last_exception = self._transact(
  File "/usr/local/lib/python3.10/site-packages/pymodbus/transaction.py", line 294, in _transact
    result = self._recv(response_length, full)
  File "/usr/local/lib/python3.10/site-packages/pymodbus/transaction.py", line 326, in _recv
    read_min = self.client.framer.recvPacket(min_size)
  File "/usr/local/lib/python3.10/site-packages/pymodbus/framer/__init__.py", line 49, in recvPacket
    return self.client.recv(size)
  File "/usr/local/lib/python3.10/site-packages/pymodbus/client/sync.py", line 89, in recv
    return self._recv(size)
  File "/usr/local/lib/python3.10/site-packages/pymodbus/client/sync.py", line 293, in _recv
    ready = select.select([self.socket], [], [], end - time_)
ValueError: filedescriptor out of range in select()
2022-12-28 18:11:04.935 ERROR (MainThread) [homeassistant] Error doing job: Task exception was never retrieved
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/modbus/sensor.py", line 95, in async_update
    raw_result = await self._hub.async_pymodbus_call(
  File "/usr/src/homeassistant/homeassistant/components/modbus/modbus.py", line 409, in async_pymodbus_call
    result = await self.hass.async_add_executor_job(
  File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/src/homeassistant/homeassistant/components/modbus/modbus.py", line 386, in _pymodbus_call
    result = entry.func(address, value, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/pymodbus/client/common.py", line 114, in read_holding_registers
    return self.execute(request)
  File "/usr/local/lib/python3.10/site-packages/pymodbus/client/sync.py", line 109, in execute
    return self.transaction.execute(request)
  File "/usr/local/lib/python3.10/site-packages/pymodbus/transaction.py", line 174, in execute
    response, last_exception = self._transact(
  File "/usr/local/lib/python3.10/site-packages/pymodbus/transaction.py", line 294, in _transact
    result = self._recv(response_length, full)
  File "/usr/local/lib/python3.10/site-packages/pymodbus/transaction.py", line 326, in _recv
    read_min = self.client.framer.recvPacket(min_size)
  File "/usr/local/lib/python3.10/site-packages/pymodbus/framer/__init__.py", line 49, in recvPacket
    return self.client.recv(size)
  File "/usr/local/lib/python3.10/site-packages/pymodbus/client/sync.py", line 89, in recv
    return self._recv(size)
  File "/usr/local/lib/python3.10/site-packages/pymodbus/client/sync.py", line 293, in _recv
    ready = select.select([self.socket], [], [], end - time_)
ValueError: filedescriptor out of range in select()
2022-12-28 18:11:06.067 ERROR (MainThread) [homeassistant.components.shelly] Error fetching Shelly3 data: Device disconnected
2022-12-28 18:11:06.486 ERROR (MainThread) [homeassistant.components.shelly] Error fetching shelly2 data: Device disconnected
2022-12-28 18:11:06.499 ERROR (MainThread) [homeassistant.components.shelly] Error fetching shelly1 data: Device disconnected
2022-12-28 18:11:10.380 ERROR (MainThread) [homeassistant] Error doing job: Fatal read error on socket transport
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/asyncio/selector_events.py", line 854, in _read_ready__data_received
    data = self._sock.recv(self.max_size)
MemoryError
2022-12-28 18:11:10.384 ERROR (MainThread) [homeassistant.components.xbox] Unexpected error fetching xbox data: 
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/helpers/update_coordinator.py", line 225, in _async_refresh
    self.data = await self._async_update_data()
  File "/usr/src/homeassistant/homeassistant/components/xbox/__init__.py", line 244, in _async_update_data
    batch: PeopleResponse = await self.client.people.get_friends_own_batch(
  File "/usr/local/lib/python3.10/site-packages/xbox/webapi/api/provider/people/__init__.py", line 106, in get_friends_own_batch
    resp = await self.client.session.post(
  File "/usr/local/lib/python3.10/site-packages/xbox/webapi/api/client.py", line 95, in post
    return await self.request(hdrs.METH_POST, url, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/xbox/webapi/api/client.py", line 81, in request
    return await self._auth_mgr.session.request(
  File "/usr/local/lib/python3.10/site-packages/aiohttp/client.py", line 535, in _request
    conn = await self._connector.connect(
  File "/usr/local/lib/python3.10/site-packages/aiohttp/connector.py", line 542, in connect
    proto = await self._create_connection(req, traces, timeout)
  File "/usr/local/lib/python3.10/site-packages/aiohttp/connector.py", line 907, in _create_connection
    _, proto = await self._create_direct_connection(req, traces, timeout)
  File "/usr/local/lib/python3.10/site-packages/aiohttp/connector.py", line 1175, in _create_direct_connection
    transp, proto = await self._wrap_create_connection(
  File "/usr/local/lib/python3.10/site-packages/aiohttp/connector.py", line 986, in _wrap_create_connection
    return await self._loop.create_connection(*args, **kwargs)  # type: ignore[return-value]  # noqa
  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 1089, in create_connection
    transport, protocol = await self._create_connection_transport(
  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 1119, in _create_connection_transport
    await waiter
  File "/usr/local/lib/python3.10/asyncio/selector_events.py", line 854, in _read_ready__data_received
    data = self._sock.recv(self.max_size)
MemoryError
2022-12-28 18:11:10.469 ERROR (MainThread) [homeassistant] Error doing job: Task exception was never retrieved
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/elasticsearch/_async/http_aiohttp.py", line 254, in perform_request
    raw_data = await response.text()
  File "/usr/local/lib/python3.10/site-packages/elasticsearch/_async/http_aiohttp.py", line 323, in text
    await self.read()
  File "/usr/local/lib/python3.10/site-packages/aiohttp/client_reqrep.py", line 1036, in read
    self._body = await self.content.read()
  File "/usr/local/lib/python3.10/site-packages/aiohttp/streams.py", line 375, in read
    block = await self.readany()
  File "/usr/local/lib/python3.10/site-packages/aiohttp/streams.py", line 397, in readany
    await self._wait("readany")
  File "/usr/local/lib/python3.10/site-packages/aiohttp/streams.py", line 304, in _wait
    await waiter
aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/config/custom_components/elasticsearch/es_doc_publisher.py", line 326, in _publish_queue_timer
    await self.async_do_publish()
  File "/config/custom_components/elasticsearch/es_doc_publisher.py", line 193, in async_do_publish
    await self.async_bulk_sync_wrapper(actions)
  File "/config/custom_components/elasticsearch/es_doc_publisher.py", line 207, in async_bulk_sync_wrapper
    bulk_response = await async_bulk(self._gateway.get_client(), actions)
  File "/usr/local/lib/python3.10/site-packages/elasticsearch/_async/helpers.py", line 259, in async_bulk
    async for ok, item in async_streaming_bulk(client, actions, *args, **kwargs):
  File "/usr/local/lib/python3.10/site-packages/elasticsearch/_async/helpers.py", line 180, in async_streaming_bulk
    async for data, (ok, info) in azip(
  File "/usr/local/lib/python3.10/site-packages/elasticsearch/_async/helpers.py", line 110, in azip
    yield tuple([await x.__anext__() for x in aiters])
  File "/usr/local/lib/python3.10/site-packages/elasticsearch/_async/helpers.py", line 110, in <listcomp>
    yield tuple([await x.__anext__() for x in aiters])
  File "/usr/local/lib/python3.10/site-packages/elasticsearch/_async/helpers.py", line 73, in _process_bulk_chunk
    resp = await client.bulk("\n".join(bulk_actions) + "\n", *args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/elasticsearch/_async/client/__init__.py", line 446, in bulk
    return await self.transport.perform_request(
  File "/usr/local/lib/python3.10/site-packages/elasticsearch/_async/transport.py", line 297, in perform_request
    status, headers, data = await connection.perform_request(
  File "/usr/local/lib/python3.10/site-packages/elasticsearch/_async/http_aiohttp.py", line 262, in perform_request
    self.log_request_fail(
  File "/usr/local/lib/python3.10/site-packages/elasticsearch/connection/base.py", line 276, in log_request_fail
    body = body.decode("utf-8", "ignore")
MemoryError
2022-12-28 18:11:11.363 ERROR (MainThread) [homeassistant.components.shelly] Unexpected error fetching shelly5 data: 'NoneType' object has no attribute 'call_exception_handler'
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/asyncio/selector_events.py", line 1054, in sendto
    self._sock.sendto(data, addr)
AttributeError: 'NoneType' object has no attribute 'sendto'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/helpers/update_coordinator.py", line 225, in _async_refresh
    self.data = await self._async_update_data()
  File "/usr/src/homeassistant/homeassistant/components/shelly/coordinator.py", line 219, in _async_update_data
    await self.device.update()
  File "/usr/local/lib/python3.10/site-packages/aioshelly/block_device.py", line 188, in update
    event = await self._coap_request("s")
  File "/usr/local/lib/python3.10/site-packages/aioshelly/block_device.py", line 247, in _coap_request
    await self.coap_context.request(self.ip_address, path)
  File "/usr/local/lib/python3.10/site-packages/aioshelly/coap.py", line 96, in request
    self.transport.sendto(msg, (ip, 5683))
  File "/usr/local/lib/python3.10/asyncio/selector_events.py", line 1064, in sendto
    self._fatal_error(
  File "/usr/local/lib/python3.10/asyncio/selector_events.py", line 711, in _fatal_error
    self._loop.call_exception_handler({
AttributeError: 'NoneType' object has no attribute 'call_exception_handler'

Additional information

No response

home-assistant[bot] commented 1 year ago

Hey there @adamchengtkc, @janiversen, @vzahradnik, mind taking a look at this issue as it has been labeled with an integration (modbus) you are listed as a code owner for? Thanks!

Code owner commands Code owners of `modbus` can trigger bot actions by commenting: - `@home-assistant close` Closes the issue. - `@home-assistant rename Awesome new title` Change the title of the issue. - `@home-assistant reopen` Reopen the issue. - `@home-assistant unassign modbus` Removes the current integration label and assignees on the issue, add the integration domain after the command.

(message by CodeOwnersMention)


modbus documentation modbus source (message by IssueLinks)

janiversen commented 1 year ago

Please have a look in the log before the part you posted, there should be an error/warning from modbus. I need that to see what might be wrong. It is clearly modbus that reports a problem, but the cause might come from other parts. The problem you report is because modbus is being forcefully stopped, we need to know the cause of that.

ankerstal commented 1 year ago

I actually disabled the modbus integration yesterday in order to see if it helps, so I dont have any logs right now..

But I have reason to believe that the problem is actually really slow i/o from the SD card. I have ordered a SSD and will try to see if that helps. If it helps I will close this issue.

janiversen commented 1 year ago

I have 2 servers running on RPI4, both with a number of integrations, but one have a lot of modbus entities and the other have a lot of shelly entities.

I throw away the SD card about a year ago (even though they were the face A2 type), because I continued to experience strange problems..especially when I added debug to either the modbus or shelly integrations. I just checked uptime, 1 RPI4 have been running for 3months and the other for 2weeks (the restarts correspond to some major upgrades I did).

Let me hear how your experiment goes....I am still not ruling out that it can a modbus problem, most software is "an accident waiting to happen".

ankerstal commented 1 year ago

Hi!

I changed SD card for a SSD now and everything seems to run smoothly. But I still got problems with the modbus integration. Today i stopped working again with alot of the following in the logs:

2023-01-03 17:29:27.656 ERROR (SyncWorker_10) [homeassistant.components.modbus.modbus] Pymodbus: ftx: Modbus Error: [Input/Output] Modbus Error: [Invalid Message] No response rec
eived, expected at least 8 bytes (0 received)
2023-01-03 17:29:27.777 ERROR (MainThread) [homeassistant] Error doing job: Task exception was never retrieved
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/modbus/binary_sensor.py", line 99, in async_update
    result = await self._hub.async_pymodbus_call(
  File "/usr/src/homeassistant/homeassistant/components/modbus/modbus.py", line 409, in async_pymodbus_call
    result = await self.hass.async_add_executor_job(
  File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/src/homeassistant/homeassistant/components/modbus/modbus.py", line 386, in _pymodbus_call
    result = entry.func(address, value, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/pymodbus/client/common.py", line 48, in read_coils
    return self.execute(request)
  File "/usr/local/lib/python3.10/site-packages/pymodbus/client/sync.py", line 109, in execute
    return self.transaction.execute(request)
  File "/usr/local/lib/python3.10/site-packages/pymodbus/transaction.py", line 174, in execute
    response, last_exception = self._transact(
  File "/usr/local/lib/python3.10/site-packages/pymodbus/transaction.py", line 294, in _transact
    result = self._recv(response_length, full)
  File "/usr/local/lib/python3.10/site-packages/pymodbus/transaction.py", line 362, in _recv
    result = self.client.framer.recvPacket(expected_response_length)
  File "/usr/local/lib/python3.10/site-packages/pymodbus/framer/__init__.py", line 49, in recvPacket
    return self.client.recv(size)
  File "/usr/local/lib/python3.10/site-packages/pymodbus/client/sync.py", line 89, in recv
    return self._recv(size)
  File "/usr/local/lib/python3.10/site-packages/pymodbus/client/sync.py", line 293, in _recv
    ready = select.select([self.socket], [], [], end - time_)
ValueError: filedescriptor out of range in select()
2023-01-03 17:29:27.989 ERROR (MainThread) [homeassistant] Error doing job: Task exception was never retrieved
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/modbus/binary_sensor.py", line 99, in async_update
    result = await self._hub.async_pymodbus_call(
  File "/usr/src/homeassistant/homeassistant/components/modbus/modbus.py", line 409, in async_pymodbus_call
    result = await self.hass.async_add_executor_job(
  File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/src/homeassistant/homeassistant/components/modbus/modbus.py", line 386, in _pymodbus_call
    result = entry.func(address, value, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/pymodbus/client/common.py", line 48, in read_coils
    return self.execute(request)
  File "/usr/local/lib/python3.10/site-packages/pymodbus/client/sync.py", line 109, in execute
    return self.transaction.execute(request)
  File "/usr/local/lib/python3.10/site-packages/pymodbus/transaction.py", line 174, in execute
    response, last_exception = self._transact(
  File "/usr/local/lib/python3.10/site-packages/pymodbus/transaction.py", line 294, in _transact
    result = self._recv(response_length, full)
  File "/usr/local/lib/python3.10/site-packages/pymodbus/transaction.py", line 362, in _recv
    result = self.client.framer.recvPacket(expected_response_length)
  File "/usr/local/lib/python3.10/site-packages/pymodbus/framer/__init__.py", line 49, in recvPacket
    return self.client.recv(size)
  File "/usr/local/lib/python3.10/site-packages/pymodbus/client/sync.py", line 89, in recv
    return self._recv(size)
  File "/usr/local/lib/python3.10/site-packages/pymodbus/client/sync.py", line 293, in _recv
    ready = select.select([self.socket], [], [], end - time_)
ValueError: filedescriptor out of range in select()

I cant really se any significant errors before this in the log.

ankerstal commented 1 year ago

Nevermind, it seems like this problem is caused by another integration exhausting resources by not closing tcp connections after it was used. The memory problem was solved by disable the integration. And I guess "filedescriptor out of range" will be solved as well.