ikalchev / HAP-python

A python implementation of the HomeKit Accessory Protocol (HAP)
Other
626 stars 119 forks source link

RuntimeError in accessory.py in Python 3.5 (Linux) #147

Open rcarmo opened 6 years ago

rcarmo commented 6 years ago

Hi there,

I'm trying to deploy HAP-python (git master) on Python 3.5 and coming up against a strange error that seems to be related to coroutine handling.

I've trimmed down my code bit by bit by first removing all the asyncio bits until I got a simple bridge set up, with a passive accessory, that still reproduces the problem for me:

import logging
import signal

from pyhap.accessory import Accessory, Bridge
from pyhap.accessory_driver import AccessoryDriver
from pyhap.const import CATEGORY_SWITCH
from config import PERSISTENCE_FILE, PORT, log

class TV(Accessory):
    category = CATEGORY_SWITCH

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

        serv_switch = self.add_preload_service('Switch')
        self.char_on = serv_switch.configure_char('On', setter_callback=self.on_power)

    @Accessory.run_at_interval(10)
    def run(self):
        pass

    def on_power(self, value):
        log.warn(value)

def get_bridge(driver):
    """Call this method to get a Bridge instead of a standalone accessory."""
    bridge = Bridge(driver, 'Python Bridge')
    bridge.add_accessory(TV(driver, 'Python TV'))
    return bridge

driver = AccessoryDriver(port=PORT, persist_file=PERSISTENCE_FILE)
driver.add_accessory(accessory=get_bridge(driver))
signal.signal(signal.SIGTERM, driver.signal_handler)
driver.start()

And when running it with python -u main.py using uWSGI solely as a supervisor (not loading modules, actually executing a subprocess) I get this:

bridge.1 | [uwsgi-daemons] spawning "python -u main.py" (uid: 1001 gid: 33)
bridge.1 | date=2018-08-24T10:28:47.011 pid=1162 tid=-1225506048 level=INFO filename=accessory_driver.py:329:add_accessory msg="Loading Accessory state from `/home/piku/.pyhomekit-bridge`"
bridge.1 | date=2018-08-24T10:28:47.019 pid=1162 tid=-1225506048 level=INFO filename=accessory_driver.py:190:start msg="Starting the event loop"
bridge.1 | date=2018-08-24T10:28:47.021 pid=1162 tid=-1265634192 level=INFO filename=accessory_driver.py:218:_do_start msg="Starting accessory Python Bridge onaddress 192.168.1.98, port 55314."
bridge.1 | date=2018-08-24T10:28:47.516 pid=1162 tid=-1285557136 level=INFO filename=hap_server.py:794:get_request msg="Got connection with ('192.168.1.108', 49962)."
bridge.1 | date=2018-08-24T10:28:47.532 pid=1162 tid=-1293945744 level=INFO filename=hap_server.py:144:log_message msg="192.168.1.108 - "POST /pair-verify HTTP/1.1" 200 -"
bridge.1 | date=2018-08-24T10:28:47.580 pid=1162 tid=-1293945744 level=INFO filename=hap_server.py:144:log_message msg="192.168.1.108 - "POST /pair-verify HTTP/1.1" 200 -"
bridge.1 | date=2018-08-24T10:28:47.588 pid=1162 tid=-1293945744 level=INFO filename=hap_server.py:144:log_message msg="192.168.1.108 - "GET /accessories HTTP/1.1" 200 -"
bridge.1 | date=2018-08-24T10:28:47.741 pid=1162 tid=-1293945744 level=INFO filename=hap_server.py:144:log_message msg="192.168.1.108 - "PUT /characteristics HTTP/1.1" 207 -"
bridge.1 | date=2018-08-24T10:28:47.837 pid=1162 tid=-1225506048 level=ERROR filename=base_events.py:1148:default_exception_handler msg="Task exception was never retrieved
bridge.1 | future: <Task finished coro=<_wrapper() done, defined at /home/piku/.piku/envs/homekit/src/hap-python/pyhap/accessory.py:266> exception=RuntimeError('Task <Task pending coro=<_wrapper() running at /home/piku/.piku/envs/homekit/src/hap-python/pyhap/accessory.py:270>> got Future <Future pending> attached to adifferent loop',)>"
bridge.1 | Traceback (most recent call last):
bridge.1 |   File "/usr/lib/python3.5/asyncio/tasks.py", line 241, in _step
bridge.1 |     result = coro.throw(exc)
bridge.1 |   File "/home/piku/.piku/envs/homekit/src/hap-python/pyhap/accessory.py", line 270, in _wrapper
bridge.1 |     self.driver.aio_stop_event, seconds):
bridge.1 |   File "/home/piku/.piku/envs/homekit/src/hap-python/pyhap/util.py", line 133, in event_wait
bridge.1 |     await asyncio.wait_for(event.wait(), timeout, loop=loop)
bridge.1 |   File "/usr/lib/python3.5/asyncio/tasks.py", line 385, in wait_for
bridge.1 |     yield from waiter
bridge.1 |   File "/usr/lib/python3.5/asyncio/futures.py", line 361, in __iter__
bridge.1 |     yield self  # This tells Task to wait for completion.
bridge.1 | RuntimeError: Task <Task pending coro=<_wrapper() running at /home/piku/.piku/envs/homekit/src/hap-python/pyhap/accessory.py:270>> got Future <Future pending> attached to a different loop

I've tried PyPi, master and dev, and all of them work on Python 3.6 and 3.7 on my Mac, but not on Python 3.5 in Ubuntu.

Here's my requirements.txt:

-e git+https://github.com/ikalchev/HAP-python.git@master#egg=HAP-python[QRCode]
python-dateutil

I have bunches of asyncio stuff running on this machine, so I'd say it isn't a problem with the local Python.

rcarmo commented 6 years ago

Changing the issue name since I've since reproduced it on 3.5.2 on the Mac as well.

rcarmo commented 6 years ago

Looking into this a bit further, the _wrapper used for the decorator seems to be at fault, since there is no guarantee it will use the same event loop as the current AccessoryDriver instance.

However, since I am trying to do purely threaded code (since I have to use another library that does hard blocking on sockets to interface with my hardware) I do think that the asyncio bits need to be completely separated from the threaded model... The magic in add_job could possibly be done away with, which would make it less likely for things like this to happen.

cdce8p commented 6 years ago

Have you tried it without the run_at_interval method?

Regarding add_job: It's not that much magic, just selects how and were to run the corresponding methods. Scheduling of corutines and shutting them down safely is the tricky part.

rcarmo commented 6 years ago

Yes, and it works in that case, but I need a recurring mechanism to update device state. I've improvised with this:

    def run(self):
        log.debug("tick")
        self.t = Thread(target=discover, args=(self,))
        self.t.start()

As to the "magic" in add_job and async_add_job, reading through the implementation and having it try to figure out what the target is feels odd and unpythonic - having a single kind of AccessoryDriver to manage both kinds of concurrency seems prone to trouble since you have to keep track of both in the same class (having an abstract class to hold non-concurrency logic and splitting them into AsyncAccessoryDriver and ThreadedAccessoryDriver, for instance, would make it easier to debug, and also more obvious how to integrate them into existing solutions).

This issue that I'm having would probably have benefited from that kind of split, since right now what appears to be happening is that even though I'm not actually using asyncio, bits of the code assume I want to and (for some reason in 3.5) the wrong event loop is selected.

cdce8p commented 6 years ago

I'm no expert in asyncio, but the underlying idea was/is to use async for as many program parts as possible, since threads are kind of broken and inefficient in python. It shouldn't matter to the end user weather or not he/she wants async or threaded code to run, since the decision on how to execute the methods is handled automatically. In contrast splitting the AccessoryDriver would complicate things even more.

The problem you had/have isn't relay an issue, more a warning that a scheduled coroutine wasn't executed, because you stopped the program beforehand. That's always a tricky part when dealing with asyncio. However the fact that it's working with 3.6, 3.7 like you said, points towards other issues then with the implementation in HAP-python, but again I'm no expert on that subject.

Btw. have you tried 3.5.6?

rcarmo commented 6 years ago

Well, "the user" in this case is a developer (this is a library, not an end-user app), and I, for one, would prefer to have better control over the threading model and not be surprised by library internals. I've been using asyncio since tulip, and mixing concurrency models has always been frowned upon...

Also, the problem wasn't just a warning. The scheduling stops, and it stops because _wrappersomehow picks up another event loop that is not the one defined in the instance (since it is a static method, I'm assuming the runtime somehow falls back to creating a new one).

I've only tested Python 3.5.1 and 3.5.2 (besides 3.6 and 3.7). I haven't been paying attention to event loop initialization changes in the latest release notes (I've only really paid attention to generator enhancements), so I can't say what is different - only that Ubuntu still ships with 3.5 on 14.04, and that I can't really move to 3.6 on that platform for a good while (even with 18.04 now shipping 3.6, I only have 14.04 on the systems I'm trying to run this).

cdce8p commented 6 years ago

Also, the problem wasn't just a warning. The scheduling stops, and it stops because _wrappersomehow picks up another event loop that is not the one defined in the instance (since it is a static method, I'm assuming the runtime somehow falls back to creating a new one).

I see. Originally I though it's an error that occurs when stopping HAP-python and their is still one coroutine scheduled. This was a common problem during development.

Without digging much deeper, I could guess that it's an issue that cancellation doesn't work probably. Taking a look at the python changelog supports that: https://bugs.python.org/issue27585 The issue was fixed with 3.5.4.

Please run at least the latest version of 3.5 and than test it again! On Ubuntu that shouldn't be a problem (a virtualenv would work as well).

I, for one, would prefer to have better control over the threading model and not be surprised by library internals.

Please go ahead and suggest something better. It's always a trade off between simplicity and customizability. Although If you're so keen on using threads, you can already:

You don't have to use the Accessory.run method. That's just one option.

rcarmo commented 6 years ago

Without digging much deeper, I could guess that it's an issue that cancellation doesn't work probably. Taking a look at the python changelog supports that: https://bugs.python.org/issue27585 https://bugs.python.org/issue27585 The issue was fixed with 3.5.4.

Please run at least the latest version of 3.5 and than test it again! On Ubuntu that shouldn't be a problem (a virtualenv would work as well).

I’ll have to setup pyenv on one of those machines, since the version in apt updates hasn’t budged. Still, I won’t be able to deploy the app (it has to run with the base system Python). I, for one, would prefer to have better control over the threading model and not be surprised by library internals.

Please go ahead and suggest something better. It's always a trade off between simplicity and customizability.

I already did, actually. Make AccessoryDriver an abstract class and have async and threaded subclasses which could perform their own cleanups and concurrency management in a predictable way.

It would also simplify each subclass of AccessoryDriver and make them much easier to test.

Although If you're so keen on using threads, you can already:

Start the accessory driver in one And run your method in another continuous You don't have to use the Accessory.run method. That's just one option.

I’m not very keen, but have to because the SSDP code I’m using requires a hard socket timeout on a multicast endpoint, and reimplementing it under asyncio is full of little pitfalls (even in 3.7.0 that isn’t straightforward).

ikalchev commented 6 years ago

Sorry I am jumping in so late - I agree that mixing both models can produce surprising effects. And I also think that HAP-python won't be fully async any time soon (as much as I want) - both mDNS and the HTTP server are huge tasks. For one, I have looked at aiohttp for replacing the server, but the custom crypto really raises the bar. An improvement we can do is to have only sync or async accessories in a driver, so that there are less surprises like this (e.g. I have an issue on my macOS on python 3.7 but it's in the tests only for now)

Anyway, @rcarmo The AccessoryDriver defines a threading.Event object that is set whenever the driver should stop - stop_event.

So you can define your Accessory.run method to start a new Thread that does e.g.

while not self.diver.stop_event(wait=300):
    do stuff

Then in the Accessory.stop you can have self.the_thread.join(). Do you think you could work with that until the python upgrades or we manage to fix/workaroud the wrapper?

rcarmo commented 6 years ago

Thanks for coming back to me on this. I’ve been avoiding mixing things with the Accessory instances themselves, and have threads that know about the Accessories they need to toggle and then invoke the required methods. It’s not as neat, but seems to work.

I’ll have a look at adopting stop_event, but right now I am doing something else that will also need re-thinking with regards to threading later on, which is handling MQTT messages:

class ContactSensor(Accessory):
    category = CATEGORY_SENSOR

    def __init__(self, *args, **kwargs):
        self.topic = kwargs['topic']
        del kwargs['topic']
        super().__init__(*args, **kwargs)

        serv_switch = self.add_preload_service('ContactSensor')
        self.char_contact = serv_switch.configure_char('ContactSensorState')
        # TODO: add battery info
        self.set_info_service(firmware_revision='0.1', manufacturer='Xiaomi', model='MCCGQ11LM', serial_number='0')
        self.client = mqtt.Client()
        self.client.connect(MQTT_BROKER, 1883, 10)
        self.client.subscribe(self.topic)
        self.client.on_message = self.on_message
        self.client.loop_start()

    def on_message(self, client, userdata, message):
        try:
           msg = loads(message.payload.decode("utf-8"))
        except Exception as e:
            log.error(e)
        log.debug(msg)
        if 'contact' in msg:
            now = msg['contact']
            if now == self.char_contact.value:
                self.char_contact.value = not now
                self.char_contact.notify()

that loop_start does, in fact, start a new thread. So the plot thickens….

On 3 Oct 2018, at 20:46, Ivan Kalchev notifications@github.com wrote:

Sorry I am jumping in so late - I agree that mixing both models can produce surprising effects. And I also think that HAP-python won't be fully async any time soon (as much as I want) - both mDNS and the HTTP server are huge tasks. For one, I have looked at aiohttp for replacing the server, but the custom crypto really raises the bar. An improvement we can do is to have only sync or async accessories in a driver, so that there are less surprises like this (e.g. I have an issue on my macOS on python 3.7 but it's in the tests only for now)

Anyway, @rcarmo The AccessoryDriver defines a threading.Event object that is set whenever the driver should stop - stop_event.

So you can define your Accessory.run method to start a new Thread that does e.g.

while not self.diver.stop_event(wait=300): do stuff

Then in the Accessory.stop you can have self.the_thread.join(). Do you think you could work with that until the python upgrades or we manage to fix/workaroud the wrapper?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

ikalchev commented 6 years ago

Well something needs to be tagged run_at_interval if you see the _wrapper somewhere in your stack.

The "stock" Accessory run method is defined as async, so this definitely get's submitted to the event loop.