home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
74.15k stars 31.12k forks source link

boschshcpy library is creating many zeroconf `ServiceBrowser` objects without canceling them and leaking threads #76150

Closed armin-gh closed 2 years ago

armin-gh commented 2 years ago

The problem

running HA 2022.7.5 in a Python venv, Python 3.9, Raspberry Pi 4, 4GB, Storage on SSD Once in a while I have such a message in the log: Not related to a new install, I had observed this at least the past 6 months, if not longer. But usually I restart the system every 2nd or 3rd day after updates or configuration changes.

2022-08-02 04:50:05 ERROR (MainThread) [homeassistant] Error doing job: Task exception was never retrieved
Traceback (most recent call last):
  File "/srv/homeassistant2/lib/python3.9/site-packages/homeassistant/data_entry_flow.py", line 222, in async_init
    flow, result = await task
  File "/srv/homeassistant2/lib/python3.9/site-packages/homeassistant/data_entry_flow.py", line 249, in _async_init
    result = await self._async_handle_step(flow, flow.init_step, data, init_done)
  File "/srv/homeassistant2/lib/python3.9/site-packages/homeassistant/data_entry_flow.py", line 359, in _async_handle_step
    result: FlowResult = await getattr(flow, method)(user_input)
  File "/home/homeassistant/.homeassistant/custom_components/bosch_shc/config_flow.py", line 193, in async_step_zeroconf
    self.info = await self._get_info(discovery_info.host)
  File "/home/homeassistant/.homeassistant/custom_components/bosch_shc/config_flow.py", line 225, in _get_info
    return await self.hass.async_add_executor_job(
  File "/usr/lib/python3.9/concurrent/futures/thread.py", line 52, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/homeassistant/.homeassistant/custom_components/bosch_shc/config_flow.py", line 74, in get_info_from_host
    information = session.mdns_info()
  File "/srv/homeassistant2/lib/python3.9/site-packages/boschshcpy/session.py", line 277, in mdns_info
    return SHCInformation(
  File "/srv/homeassistant2/lib/python3.9/site-packages/boschshcpy/information.py", line 80, in __init__
    self.get_unique_id(zeroconf)
  File "/srv/homeassistant2/lib/python3.9/site-packages/boschshcpy/information.py", line 138, in get_unique_id
    self._listener = SHCListener(zeroconf, self.filter)
  File "/srv/homeassistant2/lib/python3.9/site-packages/boschshcpy/information.py", line 32, in __init__
    ServiceBrowser(zeroconf, "_http._tcp.local.", handlers=[self.service_update])
  File "/srv/homeassistant2/lib/python3.9/site-packages/zeroconf/_services/browser.py", line 511, in __init__
    self.start()
  File "/usr/lib/python3.9/threading.py", line 874, in start
    _start_new_thread(self._bootstrap, ())
RuntimeError: can't start new thread

at around 04:50am BoschSHC integration could not start a new thread.

This time I monitored the system since restarting on 2022-07-28 17:50. Did install py-spy and found this: I have an ever increasing number of threads with this name: Thread 0xBE80E440 (active): “zeroconf-ServiceBrowser-_http._tcp-2954” Setup a sensor for monitoring:

- platform: command_line
    name: ThreadWatch1
    command: "py-spy dump --pid $(pgrep -u homeassistant hass ) |grep zeroconf-ServiceBrowser-_http._tcp | wc -l"

After starting the HA-Service I have 3 such threads and it increases by 2 threads every 40-50 minutes, the maximum was 312 “zeroconf-ServiceBrowser-_http._tcp-”-Threads yesterday morning at 04:50, same time I had the error above in the logs.

Stacktrace created with py-spy dump -s -l --pid $(pgrep -u homeassistant hass ) for all 312 threads is similar:

Thread 0xBE80E440 (active): "zeroconf-ServiceBrowser-_http._tcp-2954"
    run (zeroconf/_services/browser.py:530)
        Arguments:
            self: <ServiceBrowser at 0x6f1db328>
        Locals:
            event: (("ds416._http._tcp.local.", "_http._tcp.local."), <ServiceStateChange at 0xae016178>)
    _bootstrap_inner (threading.py:954)
        Arguments:
            self: <ServiceBrowser at 0x6f1db328>
    _bootstrap (threading.py:912)
        Arguments:
            self: <ServiceBrowser at 0x6f1db328>

Rule out other integrations ( BoschSHC, nmap) by disabling them. Then removed "default_config" from my configuration.yaml and started to add single integrations one at a time. After adding "zeroconf" the threads were created again.

DS416 (name listed in the stacktrace) is a Synology DS416, besides working as a fileserver it also provides DHCP- and DNS-Services here in my network.

What version of Home Assistant Core has the issue?

core-2022.7.5

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant Core

Integration causing the issue

Zeroconf

Link to integration documentation on our website

https://www.home-assistant.io/integrations/zeroconf/

Diagnostics information

No response

Example YAML snippet

#after changing default_config to a comment added integrations in the sequence below
#after adding zeroconf the leaked threads were created again
#default_config:
backup:
config:
history:
logbook:
map:
mobile_app:
sun:
ssdp:
dhcp:
zeroconf:

Anything in the logs that might be useful for us?

No response

Additional information

ask if you need a logger with specific configuration, running a logger for zeroconf with "debug" was producing huge files and I stopped the logs

probot-home-assistant[bot] commented 2 years ago

Hey there @bdraco, mind taking a look at this issue as it has been labeled with an integration (zeroconf) you are listed as a code owner for? Thanks! (message by CodeOwnersMention)


zeroconf documentation zeroconf source (message by IssueLinks)

armin-gh commented 2 years ago

Prior to creating the issue here, posted as a question in the community forums at https://community.home-assistant.io/t/thread-leak/446233

bdraco commented 2 years ago

The zeroconf integration itself is all async so it is not creating any threads.

You have another integration that is using the sync api to create ServiceBrowsers for _http._tcp that is not canceling them when it is finished with them.

I'd look through the source code to all the PyPI packages of all the integration you have installed for the string ServiceBrowser and try to narrow it down to ones that create them.

bdraco commented 2 years ago

The bug is here https://github.com/tschamm/boschshcpy/blob/fe3c795d99824365967bacaddbd3f68bb85df64b/boschshcpy/information.py#L32

The ServiceBrowser is never canceled

There isn't anything we can do here as the custom component needs to fix this.