Making your client more resilient to changes by Genius Hub

benwainwright commented 1 year ago

Hi there!

I have been getting a very similar failure to the one found here: https://github.com/manzanotti/geniushub-client/issues/74 only my issue complains because it can't find 'type' in the dict its found. The stack trace fails in exactly the same place however.

With a bit of investigation, I established your library gets 'friendly device names' and so forth from data that is hardcoded in const.py - this of course means that if Genius Hub release new devices, or even update the firmware in such a way that device ids change, things break.

With a bit more investigation, I've established that you are getting your mapping data structures from the Genius Hub frontend client. A data object that appears to be completely identical in shape to the one that is causing the problem for me can be found in https://www.geniushub.co.uk/app/js/bower.js.

To make your library more resilient, I think you need to find some way of automatically fetching that data, otherwise as the maintainer you are faced with the task of continually updating the package to match any updates the GH team do. To do it at runtime would be slow, but it might not be a bad idea to do this either at install or publish time. With that in mind, I've written a python script this evening that essentially does just this: downloads the bower.js file, parses it, locates the dict containing the AST for the 'devicesModel', generates a python AST from it, then writes that to disk as a python source file.

It's not perfect (and parsing that JS file is particularly slow), but I'd like to work this into something that you can use in your package to make sure that its always up to date. If this works, would you accept it as a PR? Script is as follows:

import aiohttp
import json
import asyncio
import ast
import astor
from pyjsparser import parse

def find_symbol_with_name(name, my_dict, parent):
    """
    Recursively iterate through the generated JS AST
    until you find an object with a 'name' that matches
    the passed in name argument. Once found, return its parent
    """
    if (isinstance(my_dict, dict) and "name" in my_dict and my_dict['name'] == name):
        return parent

    if (isinstance(my_dict, dict)):
        for next_key in my_dict:
            if (type(my_dict[next_key]) == list):
                for item in my_dict[next_key]:
                    result = find_symbol_with_name(name, item, my_dict)
                    if (result != None):
                        return result

            if (isinstance(my_dict[next_key], dict)):
                result = find_symbol_with_name(name, my_dict[next_key], my_dict)
                if (result != None):
                    return result

    return None

def transform_value(value):
    if value['type'] == 'Literal':
        return ast.Constant(value['value'])

    if value['type'] == 'ArrayExpression':
        values = []
        for element in value['elements']:
            values.append(transform_value(element))

        return ast.List(elts=values)

def process_devices_model(data):
    """
    Generate a python AST from the found JS AST
    """
    elements = data['value']['elements']

    keys = []
    values = []

    for element in elements:
        for property in element['properties']:
            keys.append(ast.Constant(value=property['key']['value']))
            values.append(transform_value(property['value']))

    ast_elements = ast.Dict(
        keys=keys,
        values=values
    )

    return ast.Module(body=[
        ast.Assign(targets=[ast.Name(id="DEVICES_MODEL")],
                   value=ast.List(
            elts=[ast_elements]
        )
        )
    ])

async def main():
    async with aiohttp.ClientSession() as session:
        print('Downloading bower.js')
        async with session.get('https://www.geniushub.co.uk/app/js/bower.js') as response:
            javascript = await response.text()

            print('Parsing JavaScript')
            ast = parse(javascript)

            print('Extracting devicesModel')
            devices = find_symbol_with_name('devicesModel', ast, None)

            print('Generating source')
            devices_ast = process_devices_model(devices)
            source = astor.to_source(devices_ast)

            print('Writing to disk')
            with open('output.py', 'w') as file:
                file.write(source)

loop = asyncio.get_event_loop()
loop.run_until_complete(main())

rbubley commented 1 year ago

Would it make sense for something like this to be run as a Github Action on the repo, rather than for everyone's copy to run it? (It could then hook into tests, etc, before output was committed.)

benwainwright commented 1 year ago

So that's definitely one option. Here is the pros and cons as I see it:

Runtime on every 'update'
- probably way too slow to be feasible
Runtime 'init' method designed to be only executed once
- Home assistant would register the integration and call this once at startup (which would take longer than before), but actual 'updates' would still be fast
Before you publish the package to pypi
- No impact on the user experience, but the maintainer of this package is still going to have to trigger a new publish every time the genius hub team have made changes.

Options one and two would mean no intervention from the maintainer is required at all if the GH team introduce changes. Since I also suspect the main purpose of this package is to support the genius hub 'Home Assistant' integration, option three is going to mean the maintainer would also then need to handle changes to the home assistant core when those updates are made.

Also worth noting: My solution here isn't perfect. The GH team could refactor their code in ways that break my script - for example, I'm looking for a declaration called 'devicesModel' - all it would take would be for the GH team to rename that symbol for the above to stop working.

zxdavb commented 1 year ago

FYI: Because the v3 API was undocumented, there was a conscious decision to fail for unknown device types.

I do admit that perhaps a better exception would have been appropriate.

GeoffAtHome commented 1 year ago

I am seeing the the following:

Source: components/geniushub/__init__.py:124
First occurred: 03:38:00 (1 occurrences)
Last logged: 03:38:00

Error during setup of component geniushub
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/setup.py", line 288, in _async_setup_component
    result = await task
             ^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/geniushub/__init__.py", line 124, in async_setup
    await client.update()
  File "/usr/local/lib/python3.11/site-packages/geniushubclient/__init__.py", line 260, in update
    super().update()  # now parse all the JSON
    ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/geniushubclient/__init__.py", line 201, in update
    self.issues = [convert_issue(raw_json) for raw_json in self._issues]
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/geniushubclient/__init__.py", line 201, in <listcomp>
    self.issues = [convert_issue(raw_json) for raw_json in self._issues]
                   ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/geniushubclient/__init__.py", line 161, in convert_issue
    device_type = self.device_by_id[raw_json["data"]["nodeID"]].data["type"]
                  ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: '59'

From debugging I see the cause of the problem is the nodeID as invalid. That is, in the list of devices this address no longer exist.

To make the code more resilient, the issue is only an issue if the device actually exist. This aligns with what is the display in the Genius App.

I'll create a PR for this.

manzanotti commented 11 months ago

HI all, sorry for the delay in responding.

My inclination is to go more down the route of #79, where the code doesn't blow up when it encounters something it doesn't know about. For Issues, that means outputting Unknown Device rather than failing.

For the rest, I'm thinking that if we identify the areas of code that would break with unknown types or ids, and then create issues logging that, rather than breaking.

How does that sound as a potential solution?

GeoffAtHome commented 11 months ago

Travelling at the moment so not a full answer.

Either works for me. Handling exceptions that are throw is all that's needed for situations that are unforeseen.

On Sat, 30 Sept 2023, 12:03 Paul Manzotti, @.***> wrote:

HI all, sorry for the delay in responding.

My inclination is to go more down the route of #79 https://github.com/manzanotti/geniushub-client/pull/79, where the code doesn't blow up when it encounters something it doesn't know about. For Issues, that means outputting Unknown Device rather than failing.

For the rest, I'm thinking that if we identify the areas of code that would break with unknown types or ids, and then create issues logging that, rather than breaking.

How does that sound as a potential solution?

— Reply to this email directly, view it on GitHub https://github.com/manzanotti/geniushub-client/issues/76#issuecomment-1741741236, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABVW2WGRHIOGTHH6BCACQHTX474ATANCNFSM6AAAAAAVTARX4Y . You are receiving this because you commented.Message ID: @.***>

GeoffAtHome commented 11 months ago

I prefer non-break changes of reporting 'unknown device' rather than failing is an improvement and should not introduce other breaking changes.

manzanotti / geniushub-client

Making your client more resilient to changes by Genius Hub #76