ukBaz / python-bluezero

A simple Python interface to Bluez
MIT License
397 stars 115 forks source link

Race condition? #312

Closed Yatekii closed 2 years ago

Yatekii commented 3 years ago

Hi!

I have been playing some more with bluezero and it works quite nice so far. Dbus and especially GTK is a PITA tho. Currently I am using bluezero in async mode:

    def on_device_found(self, device):
        self.zigfred = device
        self.zigfred.on_message = self.on_message
        self.zigfred.connect()
        if self.zigfred.connected:
            def idle_z(zigfred):
                message = protocol.zigfred_pb2.Message()
                message.code = protocol.zigfred_pb2.Message.MessageCode.COMMAND
                message.command.id = 1
                message.command.code = protocol.zigfred_pb2.Command.CommandCode.STATUS

                c = message.SerializeToString()
                zigfred.message(c)
                zigfred.open_relay()
                zigfred.close_relay()
                zigfred.set_dimmer(0)
                zigfred.set_dimmer(5)

            GLib.timeout_add(300, idle_z, self.zigfred)
        else:
            print("We failed to connect!")

        print("Status done", flush=True)

What throws me off is the requirement for GLib.timeout_add(300, idle_z, self.zigfred). If I just run the code in idle_z, it sends the messages to my device properly but misses the on_message event when it gets a message back. Is there any event I have to wait for to ensure that I can catch a message?

    def connect(self):
        """
        Connect to the specified zigfred for this instance
        """
        self._zigfred.connect()

        # Resolving GATT can take a few tries, so do this spinning.
        while not self._zigfred_rx.resolve_gatt():
            sleep(0.1)
        while not self._zigfred_tx.resolve_gatt():
            sleep(0.1)

        # We register a callback for when we receive data and enable notifications for when data comes in.
        self._zigfred_tx.add_characteristic_cb(self._uart_read)
        self._zigfred_tx.start_notify()

Above code is what I do to connect to my device in addition to just the device connect.

Additionally, if I sleep for 300 ms instead of a timeout_add, it fails to catch the message too, indicating that it is doing something on the eventloop. I cannot figure what tho ... Also, tracing Dbus is kinda difficult as it spams around 300k lines just for my simple script here ...

Do you have any hints what I could try to make this more reliable/predictable?

ukBaz commented 3 years ago

I don't have much time to look at this right now and I'm struggling to understand the context in which the code above is being called.

There are a couple of things I can comment on straight away.

  1. If you want to get notifications back from a device, then you need to running with the event loop. I am not aware of any way to put the notifications in a queue that you can poll.
  2. I can't tell from your code if _zigfred is Device or Central. After you connect it takes a while for the GATT database to finish building. So typically you wait for services_resolved to go to True before loading the GATT characteristics. Maybe something more like:
        self.rmt_device.connect()
        while not self.rmt_device.services_resolved:
            sleep(0.5)
        self.load_gatt()

If you have your code posted somewhere so I can better understand the context then I'm happy to take a look.

Yatekii commented 3 years ago

Thanks for the quick answer!

Yes I am aware that I have to run the loop, which I do ofc. Sadly GLib is very suboptimal but it seems to work.

_zigfred is a Device, not a Central.

I can post the full code in a sec.

Yatekii commented 3 years ago

I have assembled a gist with my complete code:

https://gist.github.com/Yatekii/f6a762b48bec9570af34c9b91fa3860a

Please let me know if you need any further info.

ukBaz commented 3 years ago

@Yatekii I took a look at this last night and their are a few things I'm not sure about so I will hopefully find some time the weekend to see if I can create a minimal example that will run with micro:bits or something similar I have access to.

The functionality you appear to be implementing:

  1. If the event loop is idle, start scanning (discovering devices)
  2. If a device of a certain name is found, then connect (no pairing required)
  3. On connection set the tx characteristic to have read callback and start notify
  4. Every 300ms send some information/commands to the connected device

The implementation does seem like it might not be optimal. A few questions about alternative implementations.

Yatekii commented 3 years ago

Thanks @ukBaz. You got the steps I am trying to do pretty much correct. 1.-3. are as you said. 4. Is not quite correct. The 300ms are just a necessary precaution because it wont catch the reply if I send a request right away. So I schedule my request 300ms into the future. I do this request just one time.

Basically this 4. is the only issue I have. Everything else works beautifully!

I am writing a test procedure to test a device. So I will perform a sequence of requests and check the responses to those basically. There is some HW tests in between too.

Thanks a lot for your help :)

ukBaz commented 3 years ago

I have had another look at this and I still don't see that the following is needed:

        # Resolving GATT can take a few tries, so do this spinning.
        while not self._zigfred_rx.resolve_gatt():
            sleep(0.1)

Could it be that you are reconnecting to the same device a second time and you already have the callback and notification enabled because you don't clear them when you disconnect?

ukBaz commented 2 years ago

This has sat here for a while without movement so I'm going to close. If it is still an issue then please re-open with more information on how to reproduce.