hbldh / bleak

A cross platform Bluetooth Low Energy Client for Python using asyncio
MIT License
1.6k stars 280 forks source link

DBus backend high CPU usage and it grows over time #500

Closed devbis closed 1 year ago

devbis commented 3 years ago

Description

I'm using bleak to run scanning devices and connect to some other devices. When I start my application it consumes high CPU percent but it is affordable. About 30-60% of the 800MHz device. And my application is responsive at the start so it is affordable.

After working for 10-20 hours CPU usage raising and it could take 95-100% all the time. The problem It that it leads to timeout exceptions while tasks don't get their CPU time.

What I Did

  1. I restarted bluetoothd to flush caches or so, it didn't help
  2. I tried to investigate the source and monitored the number of tasks that are running. But the number of tasks is constant in general. (Some tasks appeared for connected devices and then they are stopped).

Actually, I'm empty of ideas how I can stop this growing CPU usage. What else can I check to find out the root of the problem?

dlech commented 3 years ago

Can you share the full program?

Does the high CPU usage still happen when there are no devices to connect to? (This could narrow it down to the scanner code vs. the client code.)

devbis commented 3 years ago

Yes, sure

https://github.com/devbis/ble2mqtt/

Does the high CPU usage still happen when there are no devices to connect to? (This could narrow it down to the scanner code vs. the client code.)

That's a good idea, I'll run my code for several days with passive devices only to check the scanner code and post the results later.

devbis commented 3 years ago

Ran my code for a while in scanning-only mode. It scans for 3 seconds, then sleeps for 1 sec. The scanning time CPU usage raised from 30% in the beginning up to 80-95% after 10 hours. But during sleep time it drops to 3-5% as in the beginning.

I'll try to simplify the code to make a minimal program that just scans devices without complex handling to reduce the impact of my code.

devbis commented 3 years ago
Снимок экрана 2021-04-02 в 16 15 34

I checked on the minimal example:

import asyncio as aio
from bleak.backends.device import BLEDevice
from bleak import BleakScanner

def device_detection_callback(device: BLEDevice, advertisement_data):
    print(device, advertisement_data)

async def start():
    while True:
        async with BleakScanner(scanning_mode='passive') as scanner:
            scanner.register_detection_callback(
                device_detection_callback,
            )
            await aio.sleep(3)
            devices = await scanner.get_discovered_devices()
            print('New devices: ', devices)
        await aio.sleep(1)

def main():
    loop = aio.get_event_loop()
    try:
        loop.run_until_complete(start())
    finally:
        loop.close()

if __name__ == '__main__':
    main()

After working for 12 hours it consumes 95% CPU while scanning and then drops to minimal.

dlech commented 3 years ago

What happens if you rewrite the loop like this?


async def start():
    async with BleakScanner(scanning_mode="passive") as scanner:
        scanner.register_detection_callback(
            device_detection_callback,
        )
        while True:
            await aio.sleep(3)
            await scanner.stop()
            print("New devices: ", scanner.discovered_devices)
            await aio.sleep(1)
            await scanner.start()
devbis commented 3 years ago

Started the new code. Will post the details in 10 hours or so.

devbis commented 3 years ago
Снимок экрана 2021-04-03 в 09 46 48

After 8 hours of working with updated code peak usage is again about 95%. On the start this value was 25-30%

dlech commented 3 years ago

Did you run top to confirm that it is, in fact` the Python process that is using most of the CPU?

I've been running the test program for about 24 hours now on my Linux desktop and it is just now sometimes jumping from 0% to 1% of CPU.

Are there any exceptions that are being ignored? Have you run with debug logging enabled?

You could also run sudo dbus-monitor --system if that is available to see if there is an increase in D-Bus traffic between starting the program and some time later.

devbis commented 3 years ago

In fact, I'm running the script on Freescale i.MX6ULL CP 528 MHz.It is a single-core ARM7 CPU. I suppose it is not as fast as your Linux desktop. :)

Yes, I'm running htop to check the current CPU usage of processes.

Are there any exceptions that are being ignored? I run the script as is, no exceptions are ignored

Have you run with debug logging enabled? No, the sample script doesn't include any logging configuration.

I ran dbus-monitor --system after a day running for 1 minute and dumped it to a file: https://gist.github.com/devbis/225fcac9f28fd9a3d6aa26c075f329c3

Then I restarted the python script and check CPU usage. It didn't change much. I restarted bluetoothd and CPU usage reduced.

Then I ran dbus-monitor --system for 1 minute again: https://gist.github.com/devbis/79dedeb920808ed406e58f5c4d6c9e06

(I wonder can anyone find any clues in the files)

The result file has decreased from 5.8 MB to 700 kb. I suppose the culprit of the problem is growing traffic from bluetoothd and inefficient processing of DBUS traffic in python.

dlech commented 3 years ago

Thanks for all of the logs. I have a new theory about what is going on. BlueZ caches information on each device that it sees where a device is identified by its Bluetooth address. However, many Bluetooth Low Energy devices use random addresses that change frequently. So a huge cache of devices builds up in BlueZ, even if there is only one physical device.

Each time we start and stop scanning or connect to a client, we create an new D-Bus connection and call org.freedesktop.DBus.ObjectManager.GetManagedObjects. In your 2nd log (the one with lower CPU usage), the response to this is 1158 lines long. Compared to the 1st log where the response is 53170 lines long, it is easy to see why it would be using significantly more CPU (it is processing >45x the amount of data).

So, I think the solution will be to maintain a single D-Bus bus connection for monitoring D-Bus objects for the lifetime of a Bleak program. However, BleakClient will still need to have it's own bus connection for calling the Connect method to avoid https://github.com/hbldh/bleak/issues/472#issuecomment-796330894.

I think it would also be worthwhile raising an issue with BlueZ to see if there is a way for it to not cache devices with random addresses.

devbis commented 3 years ago

Thanks for looking into it. I want to try to fork the dbus-next and rewrite the unmarshaller part to plain C for speed. I expect that it can significantly improve the performance of parsing data. This part is not asyncio-based and the code is of a good quality.

It is not related to bleak directly, but I home it will help me solve the problem with the cpu.

dlech commented 3 years ago

https://github.com/igo95862/python-sdbus/ looks interesting as well.

rbuffat commented 2 years ago

I suspect I observe the same issue on a raspberry pi 4. Is there a workaround available until a permanent solution is implemented?

Bluez 5.50, bleak 0.12.0

scanner

Drop in CPU load is after restarting the container / pi.

``` import asyncio import logging import os import sys import aiohttp from bleak import BleakScanner root = logging.getLogger() root.setLevel(logging.INFO) handler = logging.StreamHandler(sys.stdout) handler.setLevel(logging.INFO) formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s") handler.setFormatter(formatter) root.addHandler(handler) TEMPERATURE = "temperature" HUMIDITY = "humidity" VDD = "vdd" PRESSURE = "pressure" TIMESTAMP = "ts" BROADCAST_ID = "bid" DEVICE_ID = "uid" URL = os.environ["MIDDLEWARE_URL"] PORT = os.environ["MIDDLEWARE_PORT"] record_url = f"http://{URL}:{PORT}/recordEvent" logging.info(f"Send events to {record_url}") async def send_data(data, session): mes_id = int.from_bytes(data[0:2], byteorder="big", signed=False) temp_k = int.from_bytes(data[2:4], byteorder="big", signed=False) rel_hum_raw = int.from_bytes(data[4:6], byteorder="big", signed=False) vdd = int.from_bytes(data[6:8], byteorder="big", signed=False) uid = int.from_bytes(data[8:], byteorder="big", signed=False) temp = float(temp_k - 27315) / 100.0 rel_hum = float(rel_hum_raw) / 1000.0 event = {} event[TEMPERATURE] = str(temp) event[HUMIDITY] = str(rel_hum) event[VDD] = str(vdd) event[DEVICE_ID] = str(uid) event[BROADCAST_ID] = str(mes_id) try: async with session.get(record_url, params=event) as _: pass except Exception as e: logging.exception(f"Could not send event to middleware: {e}") async def run(): logging.info("Start scanner") async with aiohttp.ClientSession() as session: while True: logging.info("Start scan") async with BleakScanner() as scanner: await asyncio.sleep(4.0) devices = scanner.discovered_devices for device in devices: if device.name == "SNS": logging.info(f"Discovered [{device.address}] {device.name}") await send_data( device.metadata["manufacturer_data"][65535], session ) await asyncio.sleep(1.0) loop = asyncio.get_event_loop() loop.run_until_complete(run()) ``` ``` FROM python:3.9 COPY . /app WORKDIR /app RUN apt-get update && apt-get install -y bluez bluez-tools bluetooth RUN pip install -r requirements.txt ENV PYTHONUNBUFFERED 1 ENTRYPOINT sh entrypoint.sh ```
hbldh commented 2 years ago

@rbuffat Could you run the same code outside of a Docker container, on the same system, and confirm that the problem exists there as well?

devbis commented 2 years ago

I periodically restart the bluetoothd with cron several times a day. It helps me to keep the CPU usage at an acceptable level. But the solution is fixing dbus-next unmarshalling (e.g. rewrite that file in pure C) or write another backend (I tried to use bluepy as the custom backend and it worked for passive scan consuming 1-5% CPU for my hardware but I don't like it's approach)

dlech commented 1 year ago

We finally the the changes in place to fix this. Can you please try #902?