petretiandrea / plugp100

Work in progress implementation of tapo protocol in python.
GNU General Public License v3.0
72 stars 27 forks source link

Resiliency issues and a fix for 'NoneType' object has no attribute 'invalidate' etc #152

Closed pcollinson closed 11 months ago

pcollinson commented 1 year ago

I have plugp100 embedded in my homebuilt plug manager driving P100 plugs, it work well, but can fail in some circumstances. Actually, I think it's the only code around that understands the latest firmware updates, that seems to stop other code from functioning. So a HUGE THANKS!!!!

Running your sample code (or my code see below) against a plug that is turned off will show these messages (amongst others):

'NoneType' object has no attribute 'invalidate'
Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x7f5212a5f7d0>

Tracking things down shows that the problem is in klap_protocol.py. The close code at line 93 reads:

   async def close(self):
        self._klap_session.invalidate()
        await self._http_session.close()

if the http fails to connect, self._klap_session is None, and Python can bail out calling invalidate, then other errors cascade. Why am I testing against a plug that's off? The problem can happen if you have a complete power failure, and it takes time for things to reboot - especially the local network. I've had one occurrence of my controller just not functioning mysteriously after a power fail.

There is of course, a simple fix:

   async def close(self):
        if self._klap_session is not None:
            self._klap_session.invalidate()
        await self._http_session.close()

which I've inserted and works with no apparent bad effects. The code has a number of places that assumes that _klap_session is defined and can be called, but I've not chased these, they may not be a problem.

I've gone on to look at other resiliency issues. It's possible that these may not be a problem in the environment that plugp100 is aimed at, so these are possibly features and not bugs. I include my test script at the end of this posting, edit in your details and it will show the problem above if pointed at a plug that's not turned on.

The idea of the script is to make a connection to a live plug, then sleep so the plug can be turned off with the code stack configured and then the test given in 'stoptest' is run.

My initial findings were that none of the primitive operations (getinfo, on or off) would 'die' with an exception when the plug is turned off before the sleep finishes. So test code like:

         elif stoptest == "on":
            try:
                await plug.on()
                print("Plug on")
            except Exception as e:
                print(f"On Failed {str(e)}")
                await client.close()
                return

would not fail when the plug was off. The program appears to sit there, working away in the code somewhere. I'd like to know that the plug has gone away, so I can retry a full connection at some later time. I do use 'ping' to see if I can see the plug before attempting a plugp100 connection. So I am mostly worried about knowing that a plug has disappeared. Again this can happen in use, networks can connect and disconnect so the plug goes away briefly, but the downtime happens at an inconvenient time.

My script now uses a timeout that bails out after 5 secs (which may be too long) and will fail with a TimeoutError exception. So the code calling the plugp100 can know when the plug has gone away after it's opened it and attempts an operation. As I said, this may not matter in your use case.

Anyway here's my test script, starting life as your sample:

""" Resiliance testbed """

import asyncio
import os
import time

from plugp100.api.tapo_client import TapoClient, TapoProtocolType
from plugp100.api.plug_device import PlugDevice
from plugp100.common.credentials import AuthCredential

TAPOEMAIL = "YOUR LOGIN EMAIL"
TAPOPASSWD = "YOUR PASSWORD"
IP = "192.168.XX.XX"

# Select the command to test
stoptest = None

# Default timeouts
ATIMEOUT = 5

async def main():
    # create generic tapo api
    username = TAPOEMAIL
    password = TAPOPASSWD

    credentials = AuthCredential(username, password)
    client = TapoClient(credentials, IP, 80, None, TapoProtocolType.KLAP)
    await client.initialize()

    try:
        print(await client.get_device_info())
    except Exception as e:
        print(f"Get device info failed {str(e)}")
        await client.close()
        return

    if stoptest is not None:
        # Testing turnoff after handshake
        # The idea here is to keep the plug on
        # until this is output, sort of faking
        # a connection failure
        print("About to sleep 10 - turn plug off now");
        time.sleep(10)

        plug = PlugDevice(client)

        if stoptest == "getinfo":
            try:
                async with asyncio.timeout(ATIMEOUT):
                    print(await client.get_device_info())
            except asyncio.TimeoutError:
                print("Timeout error for 'get_device_info'")
                await client.close()
                return
            except Exception as e:
                print(f"Getinfo failed  {str(e)}")
                await client.close()
                return

        elif stoptest == "on":
            try:
                async with asyncio.timeout(ATIMEOUT):
                    await plug.on()
                    print("Plug on")
            except asyncio.TimeoutError:
                print("Timeout error for 'on'")
                await client.close()
                return
            except Exception as e:
                print(f"On Failed {str(e)}")
                await client.close()
                return

        elif stoptest == "off":
            try:
                plug = PlugDevice(client)
                async with asyncio.timeout(ATIMEOUT):
                    await plug.off()
                    print("Plug off")
            except asyncio.TimeoutError:
                print("Timeout error for 'off'")
                await client.close()
            except Exception as e:
                print(f"Off Failed {str(e)}")
                await client.close()

    try:
        await client.close()
    except Exception as e:
        print(f"Close failed {str(e)}")

if __name__ == "__main__":
    loop = asyncio.new_event_loop()
    loop.run_until_complete(main())
    loop.run_until_complete(asyncio.sleep(0.1))
    loop.close()

Finally I have a 'synchronous' interface to the plugp100 code that my code which was designed to run with P100 uses, if this is of interest please let me know.