home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
73.66k stars 30.8k forks source link

Bluetooth manager saves and restores stale adverts in `bluetooth.remote_scanners` #130432

Open agittins opened 2 days ago

agittins commented 2 days ago

The problem

I maintain Bermuda and I've had a few cases where folks have had enormous /config/.storage/bluetooth.remote_scanners files (9MB or so), containing several months worth of advertisement data from a non-core integration, even after that integration has been uninstalled.

I haven't yet been able to confirm which integration is causing the issue, but I do have a lead I haven't yet tested, so I don't feel comfortable publicly naming it if it's not the one at fault.

The issue for me is that Bermuda tries to process all known device advertisements, but when it hits this (in-memory) cache of over 10,000 adverts systems are bogging down and becoming non-responsive. To be clear - this is my own fault because I am accessing private members of the bluetooth.manager class rather than a proper API, so naturally I get to keep the smouldering consequences of my own questionable choices. I'll be changing Bermuda to harden itself against this issue in either case.

The issue for HA is that some systems are carrying this extra in-memory cache of very stale adverts, presumably indefinitely. I don't know if it only applies to adverts gathered between a certain range of HA versions, or if it's specific to a particuar custom integration's behaviour.

Manually solving the "issue" involves shutting down HA, renaming/removing the bluetooth.remote_scanners file and starting HA again - which is not ideal for folks not familiar with ssh etc.

I think the expire adverts functions are not working in these cases because the adverts belong to a no-longer-present scanner, so they are sitting in the manager's _all_histories but not assigned to any extant scanner. I think this is also why the data persists, because HA might be modifying the saved data only for present scanners, and leaves the not-present scanner's cache data intact - but I haven't dug too deeply into the bluetooth_adaptors/storage.py to be sure.

I have an example file of about 9MB, but as it comes from a user I'd rather not share it publicly, but I have their permission to email or provide a link via dm to my nextcloud - I'm on the discord so can be contacted there, or my email is ash@ajg.net.au (I gave up on my own privacy decades ago!)

Some redacted excerpts of the bluetooth.remote_scanners file:

image

Timestamps: image

Adverts: image

Running the file through jq '.data."52e3fd636f144605f7665a0ae49aca33".discovered_device_advertisement_datas.[].device.address' gives 12585 lines.

What version of Home Assistant Core has the issue?

core-2024.11.1

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant Container

Integration causing the issue

bluetooth

Link to integration documentation on our website

https://www.home-assistant.io/integrations/bluetooth/

Diagnostics information

No response

Example YAML snippet

No response

Anything in the logs that might be useful for us?

2024-11-11 17:45:07.854 INFO (MainThread) [homeassistant.setup] Setup of domain dhcp took 0.00 seconds
2024-11-11 17:45:07.871 INFO (MainThread) [homeassistant.setup] Setting up usb
2024-11-11 17:45:07.872 INFO (MainThread) [homeassistant.setup] Setup of domain usb took 0.00 seconds
2024-11-11 17:45:07.913 INFO (MainThread) [homeassistant.setup] Setting up zeroconf
2024-11-11 17:45:07.932 INFO (MainThread) [homeassistant.components.zeroconf] Starting Zeroconf broadcast
2024-11-11 17:45:07.936 INFO (MainThread) [homeassistant.setup] Setup of domain zeroconf took 0.02 seconds
2024-11-11 17:45:08.035 INFO (MainThread) [homeassistant.setup] Setting up bluetooth
2024-11-11 17:45:10.800 INFO (MainThread) [homeassistant.setup] Setup of domain bluetooth took 2.76 seconds
2024-11-11 17:45:10.801 WARNING (MainThread) [asyncio] Executing <Task finished name='setup component bluetooth' coro=<async_setup_component() done, defined at /workspaces/homeassistant-core/homeassistant/setup.py:145> result=True created at /workspaces/homeassistant-core/homeassistant/util/async_.py:45> took 2.726 seconds
2024-11-11 17:45:10.812 INFO (MainThread) [homeassistant.bootstrap] Setting up stage 2: {'device_automation', 'search', 'backup', 'timer', 'bermuda', 'stt', 'input_button', 'assist_pipeline', 'blueprint', 'onboarding', 'radio_browser', 'media_source', 'mobile_app', 'input_datetime', 'schedule', 'input_select', 'logbook', 'tag', 'scene', 'homeassistant_alerts', 'tts', 'sun', 'ffmpeg', 'go2rtc', 'file_upload', 'zone', 'met', 'kegtron', 'diagnostics', 'config', 'analytics', 'camera', 'inkbird', 'application_credentials', 'input_number', 'xiaomi_ble', 'hardware', 'trace', 'private_ble_device', 'wake_word', 'stream', 'image_upload', 'my', 'system_health', 'counter', 'energy', 'shopping_list', 'input_text', 'lovelace', 'script', 'bluetooth_adapters', 'conversation', 'default_config', 'esphome', 'intent', 'automation', 'input_boolean', 'google_translate', 'device_tracker', 'history', 'person'}
2024-11-11 17:45:10.814 INFO (MainThread) [homeassistant.setup] Setting up wake_word
2024-11-11 17:45:10.815 INFO (MainThread) [homeassistant.setup] Setup of domain wake_word took 0.00 seconds
2024-11-11 17:45:10.815 INFO (MainThread) [homeassistant.setup] Setting up stt
2024-11-11 17:45:10.817 INFO (MainThread) [homeassistant.setup] Setup of domain stt took 0.00 seconds

Additional information

Note the bluetooth integration taking 2.76 seconds to load. This is on my dev machine with SSD, 32GB RAM, running 2024.11.0 in a devcontainer in vscode.

My apologies for not being able to provide complete diagnostics due to it being client data (and bluetooth diags not being redacted for MACs), but happy to provide them privately.

I have experimented with a "repair" locally, and it does clean the in-memory cache by identifying any unclaimed scanner's adverts, but doesn't affect the on-disk cache after a restart, plus this is probably not suitable for general use since a user might reboot multiple times while a scanner is not present, but still want the history restored later. The timestamps from the sample data appear to be unix epoch timestamps (vs monotonic).

        deleteme=[]
        for devkey, device in self.manager._all_history.items():  # noqa: SLF001
            if device.source not in self.manager._connectable_scanners | self.manager._non_connectable_scanners:
                deleteme.append(devkey)
        for key in deleteme:
            del self.manager._all_history[key]
home-assistant[bot] commented 2 days ago

Hey there @bdraco, mind taking a look at this issue as it has been labeled with an integration (bluetooth) you are listed as a code owner for? Thanks!

Code owner commands Code owners of `bluetooth` can trigger bot actions by commenting: - `@home-assistant close` Closes the issue. - `@home-assistant rename Awesome new title` Renames the issue. - `@home-assistant reopen` Reopen the issue. - `@home-assistant unassign bluetooth` Removes the current integration label and assignees on the issue, add the integration domain after the command. - `@home-assistant add-label needs-more-information` Add a label (needs-more-information, problem in dependency, problem in custom component) to the issue. - `@home-assistant remove-label needs-more-information` Remove a label (needs-more-information, problem in dependency, problem in custom component) on the issue.

(message by CodeOwnersMention)


bluetooth documentation bluetooth source (message by IssueLinks)

bdraco commented 2 days ago

Since monotonic time is only guaranteed to be the same per run we have to convert to epoch when saving

It looks like we don't have good guards for the host system experiencing time travel (time suddenly moving forwards or suddenly moving backwards), however it's unclear if that's the issue until I see the data

You can share the data to my Dropbox nick@koston.org or my gdrive bdraco@gmail.com

bdraco commented 2 days ago

I'm also bdraco on discord if that works better but I may be slow to respond there as it's a busy day for me

agittins commented 1 day ago

Great, I have shared it to your dropbox (my account there is agittins@purple.dropbear.id.au). Also mentioned in a comment on the file the integration I think might be doing the thing.

bdraco commented 1 day ago

I'll try to look at it soon, unfortunately today is also turning out to be another very busy day