ronengr / hass_nuki_bt

Control Nuki Lock over bluetooth
MIT License
36 stars 7 forks source link

Failure to connect during startup (possibly due to unavailable bluetooth proxy) #22

Closed chatziko closed 10 months ago

chatziko commented 11 months ago

System Health details

System Information

version core-2023.10.4
installation_type Home Assistant OS
dev false
hassio true
docker true
user root
virtualenv false
python_version 3.11.5
os_name Linux
os_version 6.1.56
arch x86_64
timezone Europe/Athens
config_dir /config
Home Assistant Community Store GitHub API | ok -- | -- GitHub Content | ok GitHub Web | ok GitHub API Calls Remaining | 4847 Installed Version | 1.33.0 Stage | running Available Repositories | 1379 Downloaded Repositories | 22
Home Assistant Cloud logged_in | false -- | -- can_reach_cert_server | ok can_reach_cloud_auth | ok can_reach_cloud | ok
Home Assistant Supervisor host_os | Home Assistant OS 11.0 -- | -- update_channel | stable supervisor_version | supervisor-2023.10.0 agent_version | 1.6.0 docker_version | 24.0.6 disk_total | 30.8 GB disk_used | 27.1 GB healthy | true supported | true board | ova supervisor_api | ok version_api | ok installed_addons | AppDaemon (0.13.6), ESPHome (2023.10.1), NGINX Home Assistant SSL proxy (3.5.0), Let's Encrypt (4.12.9), Piper (1.4.0), Whisper (1.0.0), Advanced SSH & Web Terminal (15.1.0), Frigate (0.12.1), File editor (5.6.0), openWakeWord (1.8.0), porcupine1 (1.0.0), Mosquitto broker (6.3.1)
Dashboards dashboards | 3 -- | -- resources | 13 views | 15 mode | storage
Recorder oldest_recorder_run | October 12, 2023 at 09:37 -- | -- current_recorder_run | October 22, 2023 at 11:07 estimated_db_size | 459.59 MiB database_engine | sqlite database_version | 3.41.2
Spotify api_endpoint_reachable | ok -- | --

Checklist

Describe the issue

Sometimes (say once every 2 restarts) when I restart HA hass_nuki_bt fails to connect to Nuki. The following appears in the log (see below for the full log):

bleak.exc.BleakError: No backend with an available connection slot that can reach address 54:D2:72:03:00:4D was found

When this happens the integration appears as "Failed to setup". But if I manually reload it (from the menu below), it always immediately connects to the lock without problem.

Screenshot from 2023-10-22 11-14-53

I think the problem might be related to the fact that I am using an esp32 bluetooth proxy. During a restart, the proxy will not be immediately available, it takes a bit of time for HA to connect to it.

In the debug logs below, first the connection error happens, then after the error it says "Polling recovered" and "Discovered bluetooth device". So it seems that 20 seconds after the error the device is finally visible, but since the initial connection failed hass_nuki_bt cannot recover from it.

I think that hass_nuki_bt should be tolerant to connection failures during startup. Right now the whole integration fails and needs to be manually reloaded. The ideal behaviour would be to initially show the lock entity as unabailable, automatically try to reconnect in the background, and then cover when the device becomes reachable.

Reproduction steps

  1. Setup nuki lock in a place reachable only through bluetooth proxy
  2. Restart a few times until the problem appears

Debug logs

2023-10-22 11:28:09.048 ERROR (MainThread) [homeassistant.config_entries] Error setting up entry Nuki Entrance for hass_nuki_bt
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/config_entries.py", line 399, in async_setup
    result = await component.async_setup_entry(hass, self)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/hass_nuki_bt/__init__.py", line 79, in async_setup_entry
    if not await coordinator.async_wait_ready():
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/hass_nuki_bt/coordinator.py", line 148, in async_wait_ready
    await self._async_update()
  File "/config/custom_components/hass_nuki_bt/coordinator.py", line 108, in _async_update
    await self.device.update_state()
  File "/usr/local/lib/python3.11/site-packages/pyNukiBT/nuki.py", line 423, in update_state
    await self.connect() # connect so we can identify the device type and update self._const accordingly
    ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pyNukiBT/nuki.py", line 386, in connect
    await self._client.connect()
  File "/usr/src/homeassistant/homeassistant/components/bluetooth/wrappers.py", line 273, in connect
    wrapped_backend = self._async_get_best_available_backend_and_device(manager)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/bluetooth/wrappers.py", line 375, in _async_get_best_available_backend_and_device
    raise BleakError(
bleak.exc.BleakError: No backend with an available connection slot that can reach address 54:D2:72:03:00:4D was found
2023-10-22 11:28:09.053 ERROR (MainThread) [custom_components.hass_nuki_bt] 54:D2:72:03:00:4D: Bluetooth error whilst polling: No backend with an available connection slot that can reach address 54:D2:72:03:00:4D was found

2023-10-22 11:28:20.787 DEBUG (MainThread) [custom_components.hass_nuki_bt] 54:D2:72:03:00:4D: Polling recovered
2023-10-22 11:28:29.525 INFO (MainThread) [custom_components.hass_nuki_bt] Discovered bluetooth device: {'name': 'Nuki_3403004D', 'address': '54:D2:72:03:00:4D', 'rssi': -63, 'manufacturer_data': {76: b'\x02\x15\xa9.\xe2\x00U\x01\x11\xe4\x91l\x08\x00 \x0c\x9af4\x03\x00M\xc8'}, 'service_data': {}, 'service_uuids': [], 'source': 'a0:b7:65:f0:53:7c', 'advertisement': AdvertisementData(local_name='Nuki_3403004D', manufacturer_data={76: b'\x02\x15\xa9.\xe2\x00U\x01\x11\xe4\x91l\x08\x00 \x0c\x9af4\x03\x00M\xc8'}, tx_power=-127, rssi=-63), 'device': BLEDevice(54:D2:72:03:00:4D, Nuki_3403004D), 'connectable': True, 'time': 739182.9895830154}

Diagnostics dump

No response

chatziko commented 11 months ago

Update: I realized that after the recent update to v0.0.7 this issue started happening after every single restart. Each time after the restart hass_nuki_bt fails with the above error, then I reload it manually, and after the reload it always connects fine.

Besed on the stacktrace it seems that this commit is the difference, adding the line

            await self.connect() # connect so we can identify the device type and update self._const accordingly

which appears in the stacktrace.

Downgrading to v0.0.6 solved the problem for me (or at least it doesn't happen on every restart).

ronengr commented 10 months ago

interesting. maybe the BT backend is not ready when the nuki starts. I thought HA takes care of such device dependencies, but maybe i can add something on my end.

What BT are you using? do you have a dongle on the HA PC or are you using some BT proxy?

chatziko commented 10 months ago

Thanks for the response, I use an esp32 bluetooth proxy. The proxy necessarily takes a few seconds to connect which could be related to the problem.

I guess hass_nuki_bt should wait until the first advertisement arrives from the nuki or something like that, and only then try to connect.

chatziko commented 10 months ago

In the debug logs I included in the issue, first the connection error happens:

2023-10-22 11:28:09.053 ERROR (MainThread) [custom_components.hass_nuki_bt] 54:D2:72:03:00:4D: Bluetooth error whilst polling: No backend with an available connection slot that can reach address 54:D2:72:03:00:4D was found

Then 20 seconds later it seems that the nuki is finally visible, but since the initial connection failed hass_nuki_bt cannot recover from it:

2023-10-22 11:28:20.787 DEBUG (MainThread) [custom_components.hass_nuki_bt] 54:D2:72:03:00:4D: Polling recovered
2023-10-22 11:28:29.525 INFO (MainThread) [custom_components.hass_nuki_bt] Discovered bluetooth device: {'name': 'Nuki_3403004D', 'address': '54:D2:72:03:00:4D', 'rssi': -63, 'manufacturer_data': {76: b'\x02\x15\xa9.\xe2\x00U\x01\x11\xe4\x91l\x08\x00 \x0c\x9af4\x03\x00M\xc8'}, 'service_data': {}, 'service_uuids': [], 'source': 'a0:b7:65:f0:53:7c', 'advertisement': AdvertisementData(local_name='Nuki_3403004D', manufacturer_data={76: b'\x02\x15\xa9.\xe2\x00U\x01\x11\xe4\x91l\x08\x00 \x0c\x9af4\x03\x00M\xc8'}, tx_power=-127, rssi=-63), 'device': BLEDevice(54:D2:72:03:00:4D, Nuki_3403004D), 'connectable': True, 'time': 739182.9895830154}

Note that this happens on v0.0.7.

chatziko commented 10 months ago

Note that the main problem is v0.0.7 in which the integration remains in a "failed to setup" (see screenshot) state, and I need to manually restart it.

In v0.0.6 the behaviour is different! The is an error that appears (multiple times in the log):

2023-11-26 20:43:39.739 ERROR (MainThread) [pyNukiBT.nuki] Bleak Error while sending data on attempt 1
2023-11-26 20:43:39.739 ERROR (MainThread) [pyNukiBT.nuki] No backend with an available connection slot that can reach address 54:D2:72:03:00:4D was found
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/pyNukiBT/nuki.py", line 336, in _send_command
    await self.connect()
  File "/usr/local/lib/python3.11/site-packages/pyNukiBT/nuki.py", line 386, in connect
    await self._client.connect()
  File "/usr/src/homeassistant/homeassistant/components/bluetooth/wrappers.py", line 277, in connect
    wrapped_backend = self._async_get_best_available_backend_and_device(manager)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/bluetooth/wrappers.py", line 383, in _async_get_best_available_backend_and_device
    raise BleakError(
bleak.exc.BleakError: No backend with an available connection slot that can reach address 54:D2:72:03:00:4D was found

2023-11-26 20:43:40.556 ERROR (MainThread) [pyNukiBT.nuki] Bleak Error while sending data on attempt 2
2023-11-26 20:43:40.556 ERROR (MainThread) [pyNukiBT.nuki] No backend with an available connection slot that can reach address 54:D2:72:03:00:4D was found
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/pyNukiBT/nuki.py", line 336, in _send_command
    await self.connect()
  File "/usr/local/lib/python3.11/site-packages/pyNukiBT/nuki.py", line 386, in connect
    await self._client.connect()
  File "/usr/src/homeassistant/homeassistant/components/bluetooth/wrappers.py", line 277, in connect
    wrapped_backend = self._async_get_best_available_backend_and_device(manager)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/bluetooth/wrappers.py", line 383, in _async_get_best_available_backend_and_device
    raise BleakError(
bleak.exc.BleakError: No backend with an available connection slot that can reach address 54:D2:72:03:00:4D was found

some seconds later there is the "discovered" message:

2023-11-26 20:43:44.094 INFO (MainThread) [custom_components.hass_nuki_bt] Discovered bluetooth device: {'name': 'Nuki_3403004D', 'address': '54:D2:72:03:00:4D', 'rssi': -58, 'manufacturer_data': {76: b'\x02\x15\xa9.\xe2\x00U\x01\x11\xe4\x91l\x08\x00 \x0c\x9af4\x03\x00M\xc8'}, 'service_data': {}, 'service_uuids': [], 'source': 'a0:b7:65:f0:53:7c', 'advertisement': AdvertisementData(local_name='Nuki_3403004D', manufacturer_data={76: b'\x02\x15\xa9.\xe2\x00U\x01\x11\xe4\x91l\x08\x00 \x0c\x9af4\x03\x00M\xc8'}, tx_power=-127, rssi=-58), 'device': BLEDevice(54:D2:72:03:00:4D, Nuki_3403004D), 'connectable': True, 'time': 34720.286568403244}

But after that the nuki connects and works fine! I don't need to manually restart the integration.

ronengr commented 10 months ago

I also use esp32 BT proxy and i didn't encounter such issues. hass_nuki_bt already waits for the BT device (or at least this is my understanding). it tries to get the BLE device, if unsuccessful raises an ConfigEntryNotReady exception. This causes HA to retry the async_setup_entry() again later. Only after successfully getting the ble device the init continues.

I need to verify that bluetooth.async_ble_device_from_address actually verifies there is a way to communicate with the device. another option is to catch the BleakError exception and raise a ConfigEntryNotReady instead. this will cause HA to retry later.

regarding v0.0.7 vs v0.0.6, in the old version there was no connection made to Nuki on init. i think it is important to try and connect, this why we can verify we actually have a working connection (and have hass_nuki_bt disabled until we do) instead of potentially failing only when trying to run some action on the lock

chatziko commented 10 months ago

I guess catching the BleakError is a good idea in any case.

I also noticed here that you should set

  "dependencies": ["bluetooth_adapters"]

in the manifest, not bluetooth. I checked several other bluetooth-related integrations, they all have bluetooth_adapters. Unfortunately this by itself does not solve the problem, I tried.

ronengr commented 10 months ago

yes, I've also noticed this now.

I have added this and also a call for async_address_present() that should verify the BT device is indeed connectable. I'm testing these changes now and hopefully will release a fix later today.

ronengr commented 10 months ago

@chatziko I've released a new version with the changes we talked about. please let me know if it fixes you issue.

chatziko commented 10 months ago

Thanks for the new version!

I'm now getting the following error in the log. However the integration does start and the nuki is visible, I don't need to manually restart it. So it's partial success :smile:

2023-11-26 23:57:14.313 ERROR (MainThread) [homeassistant.bootstrap] Error setting up integration hass_nuki_bt - received exception
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/config_entries.py", line 402, in async_setup
    result = await component.async_setup_entry(hass, self)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/hass_nuki_bt/__init__.py", line 86, in async_setup_entry
    raise ConfigEntryNotReady(f"{address} is not advertising state")
homeassistant.exceptions.ConfigEntryNotReady: 54:D2:72:03:00:4D is not advertising state

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/setup.py", line 102, in async_setup_component
    return await task
           ^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/setup.py", line 328, in _async_setup_component
    await asyncio.gather(
  File "/usr/src/homeassistant/homeassistant/config_entries.py", line 464, in async_setup
    await self._async_process_on_unload(hass)
  File "/usr/src/homeassistant/homeassistant/config_entries.py", line 703, in _async_process_on_unload
    if job := self._on_unload.pop()():
              ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/bluetooth/update_coordinator.py", line 53, in _async_cancel
    self._async_stop()
TypeError: 'bool' object is not callable
2023-11-26 23:57:14.425 INFO (MainThread) [custom_components.hass_nuki_bt] Discovered bluetooth device: {'name': 'Nuki_3403004D', 'address': '54:D2:72:03:00:4D', 'rssi': -59, 'manufacturer_data': {76: b'\x02\x15\xa9.\xe2\x00U\x01\x11\xe4\x91l\x08\x00 \x0c\x9af4\x03\x00M\xc8'}, 'service_data': {}, 'service_uuids': [], 'source': 'a0:b7:65:f0:53:7c', 'advertisement': AdvertisementData(local_name='Nuki_3403004D', manufacturer_data={76: b'\x02\x15\xa9.\xe2\x00U\x01\x11\xe4\x91l\x08\x00 \x0c\x9af4\x03\x00M\xc8'}, tx_power=-127, rssi=-59), 'device': BLEDevice(54:D2:72:03:00:4D, Nuki_3403004D), 'connectable': True, 'time': 46333.925564050674}
ronengr commented 10 months ago

strange error, not sure what to do about it. the first exception (ConfigEntryNotReady) is raised by me as the connection to the Nuki device fails. this is a good exception as this is the way HA know we are not ready yet and it should retry again later. The 2nd exception, i have no idea where it comes from. it seems to be somewhere in HA itself. maybe moving the async_on_unload() 2 lines down will help?

anyway, it is strange that the connection fails only at this stage, and that both async_address_present() and async_ble_device_from_address() pass with no issues. I would expect both of them to fail if there is no way to communicate with the Nuki, and this way we can stop the init earlier.

chatziko commented 10 months ago

I think I found it :smile:

30

chatziko commented 10 months ago

The issue seems fully solved in v0.0.9, so closing.