kennedyshead / aioasuswrt

MIT License
24 stars 24 forks source link

Index out of range error in async_get_rx #62

Closed RogerSelwyn closed 3 years ago

RogerSelwyn commented 3 years ago

I'm getting the following error occasionally on my router. I'm guessing that data is occasionally coming back as completely empty in some form, but since I'm not sure what is triggering that response it's difficult to test.

Maybe worth adding ahead of the existing return:

if not data return None

2020-11-30 23:54:58 ERROR (MainThread) [homeassistant.helpers.entity] Update for sensor.asuswrt_upload_speed fails
Traceback (most recent call last):
File "/usr/src/homeassistant/homeassistant/helpers/entity.py", line 278, in async_update_ha_state
await self.async_device_update()
File "/usr/src/homeassistant/homeassistant/helpers/entity.py", line 482, in async_device_update
await task
File "/usr/src/homeassistant/homeassistant/components/asuswrt/sensor.py", line 136, in async_update
await super().async_update()
File "/usr/src/homeassistant/homeassistant/components/asuswrt/sensor.py", line 69, in async_update
self._speed = await self._api.async_get_current_transfer_rates()
File "/usr/local/lib/python3.8/site-packages/aioasuswrt/asuswrt.py", line 401, in async_get_current_transfer_rates
data = await self.async_get_bytes_total(use_cache)
File "/usr/local/lib/python3.8/site-packages/aioasuswrt/asuswrt.py", line 380, in async_get_bytes_total
rx = await self.async_get_rx()
File "/usr/local/lib/python3.8/site-packages/aioasuswrt/asuswrt.py", line 389, in async_get_rx
return float(data[0]) if data[0] != "" else None
IndexError: list index out of range
kennedyshead commented 3 years ago

Could you please give a stacktrace?

RogerSelwyn commented 3 years ago

Added it, I realised I missed it and added it after you posted :-(

Tmin10 commented 3 years ago

In some cases I had different variant of stack traces:


Traceback (most recent call last):
File "/usr/src/homeassistant/homeassistant/helpers/update_coordinator.py", line 144, in async_refresh
self.data = await self._async_update_data()
File "/usr/src/homeassistant/homeassistant/helpers/update_coordinator.py", line 132, in _async_update_data
return await self.update_method()
File "/usr/src/homeassistant/homeassistant/components/asuswrt/sensor.py", line 119, in update_data
speed = await self._api.async_get_current_transfer_rates()
File "/usr/local/lib/python3.8/site-packages/aioasuswrt/asuswrt.py", line 401, in async_get_current_transfer_rates
data = await self.async_get_bytes_total(use_cache)
File "/usr/local/lib/python3.8/site-packages/aioasuswrt/asuswrt.py", line 380, in async_get_bytes_total
rx = await self.async_get_rx()
File "/usr/local/lib/python3.8/site-packages/aioasuswrt/asuswrt.py", line 389, in async_get_rx
return float(data[0]) if data[0] != "" else None
ValueError: could not convert string to float: '? (192.168.1.xx) at xx:xx:xx:xx:xx:xx [ether] on br0\r'```
Tmin10 commented 3 years ago

Or I got this today:

Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/helpers/update_coordinator.py", line 144, in async_refresh
    self.data = await self._async_update_data()
  File "/usr/src/homeassistant/homeassistant/helpers/update_coordinator.py", line 132, in _async_update_data
    return await self.update_method()
  File "/usr/src/homeassistant/homeassistant/components/asuswrt/sensor.py", line 119, in update_data
    speed = await self._api.async_get_current_transfer_rates()
  File "/usr/local/lib/python3.8/site-packages/aioasuswrt/asuswrt.py", line 401, in async_get_current_transfer_rates
    data = await self.async_get_bytes_total(use_cache)
  File "/usr/local/lib/python3.8/site-packages/aioasuswrt/asuswrt.py", line 380, in async_get_bytes_total
    rx = await self.async_get_rx()
  File "/usr/local/lib/python3.8/site-packages/aioasuswrt/asuswrt.py", line 389, in async_get_rx
    return float(data[0]) if data[0] != "" else None
ValueError: could not convert string to float: 'PATH=$PATH:/bin:/usr/sbin:/sbin && cat /sys/class/net/ppp0/statistics/rx_bytes\r'

Also I wrote test code with direct calling for three functions. It was working all day but after a while I got same errors

import logging
import sys
from asyncio import sleep

from aioasuswrt.asuswrt import AsusWrt
import asyncio

logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
logger = logging.getLogger(__name__)

async def main():
    api = AsusWrt(
        host="192.168.1.1",
        port="23",
        use_telnet=True,
        username="",
        password="",
        interface="ppp0"
    )
    while True:
        devices = await api.async_get_connected_devices()
        speed = await api.async_get_current_transfer_rates()
        rates = await api.async_get_bytes_total()
        print(devices)
        print(speed)
        print(rates)
        await sleep(10)

if __name__ == '__main__':
    asyncio.run(main())
DEBUG:aioasuswrt.asuswrt:Could not parse row: -sh: wl: not found
DEBUG:aioasuswrt.asuswrt:Could not parse row: -sh: wl: not found
{'xx:xx:xx:xx:xx:xx': Device(mac='xx:xx:xx:xx:xx:xx', ip='192.168.1.xx', name='xxx'), ... }
(23013, 3141)
(2719833793.0, 201333070.0)
DEBUG:aioasuswrt.asuswrt:Could not parse row: -sh: wl: not found
DEBUG:aioasuswrt.asuswrt:Could not parse row: -sh: wl: not found
{'xx:xx:xx:xx:xx:xx': Device(mac='xx:xx:xx:xx:xx:xx', ip='192.168.1.xx', name='xxx'), ... }
(23013, 3141)
(2720174522.0, 201396091.0)
ERROR:aioasuswrt.connection:Host timeout.

DEBUG:aioasuswrt.asuswrt:Could not parse row: , 18)}'; else wl -i $dev assoclist; fi; done
DEBUG:aioasuswrt.asuswrt:Could not parse row: -sh: wl: not found
DEBUG:aioasuswrt.asuswrt:Could not parse row: PATH=$PATH:/bin:/usr/sbin:/sbin && arp -n
DEBUG:aioasuswrt.asuswrt:Could not parse row: -sh: wl: not found
DEBUG:aioasuswrt.asuswrt:Could not parse row: ? (192.168.1.xx) at xx:xx:xx:xx:xx:xx [ether]  on br0
...
DEBUG:aioasuswrt.asuswrt:Could not parse row: ? (192.168.1.xxx) at xx:xx:xx:xx:xx:xx [ether]  on br0
DEBUG:aioasuswrt.asuswrt:Could not parse row: 192.168.1.xx dev br0 lladdr xx:xx:xx:xx:xx:xx REACHABLE
...
DEBUG:aioasuswrt.asuswrt:Could not parse row: 192.168.1.xxx dev br0 lladdr xx:xx:xx:xx:xx:xx REACHABLE
Traceback (most recent call last):
  File "C:\Users\Tmin10\PycharmProjects\test\asuswrt_test.py", line 33, in <module>
    asyncio.run(main())
  File "C:\Users\Tmin10\AppData\Local\Programs\Python\Python39\lib\asyncio\runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "C:\Users\Tmin10\AppData\Local\Programs\Python\Python39\lib\asyncio\base_events.py", line 642, in run_until_complete
    return future.result()
  File "C:\Users\Tmin10\PycharmProjects\test\asuswrt_test.py", line 24, in main
    speed = await api.async_get_current_transfer_rates()
  File "C:\Users\Tmin10\AppData\Local\Programs\Python\Python39\lib\site-packages\aioasuswrt\asuswrt.py", line 401, in async_get_current_transfer_rates
    data = await self.async_get_bytes_total(use_cache)
  File "C:\Users\Tmin10\AppData\Local\Programs\Python\Python39\lib\site-packages\aioasuswrt\asuswrt.py", line 380, in async_get_bytes_total
    rx = await self.async_get_rx()
  File "C:\Users\Tmin10\AppData\Local\Programs\Python\Python39\lib\site-packages\aioasuswrt\asuswrt.py", line 389, in async_get_rx
    return float(data[0]) if data[0] != "" else None
ValueError: could not convert string to float: 'PATH=$PATH:/bin:/usr/sbin:/sbin && cat /sys/class/net/ppp0/statistics/rx_bytes\r'
Exception ignored in: <function _ProactorBasePipeTransport.__del__ at 0x0000013F9E19F550>
Traceback (most recent call last):
  File "C:\Users\Tmin10\AppData\Local\Programs\Python\Python39\lib\asyncio\proactor_events.py", line 116, in __del__
  File "C:\Users\Tmin10\AppData\Local\Programs\Python\Python39\lib\asyncio\proactor_events.py", line 108, in close
  File "C:\Users\Tmin10\AppData\Local\Programs\Python\Python39\lib\asyncio\base_events.py", line 746, in call_soon
  File "C:\Users\Tmin10\AppData\Local\Programs\Python\Python39\lib\asyncio\base_events.py", line 510, in _check_closed
RuntimeError: Event loop is closed

Process finished with exit code 1

My router model: Asus RT-AC51U

JJdeVries commented 3 years ago

Interesting to see that it happens after a TimeoutError. If I think about it, it might actually be better to close the current connection if this Timeout happens. Currently the library simply uses the same reader and writer object. So it might be that the return values of the next call, could get polluted by some return of the previous call?

I'm not really sure how the default implementation of the reader and writer handle such a Timeout error.

Anyway in my opinion we might be a lot more robuust if we call a self.disconnect() after handling the two Timeout errors in the TelnetConnection implementation. Would you be able to add these 2 lines and run this test again?

Tmin10 commented 3 years ago

Not sure that I got the whole idea but I added

self.disconnect()
_LOGGER.info("Telnet disconnected.")

to the async_run_command method of TelnetConnection class after _LOGGER.error("Host timeout.") message and will test with it. This error isn't so regular but happens once per day and completely blocks next api calls.

JJdeVries commented 3 years ago

@Tmin10 Did you manage to trigger the error, and see any different behaviour?

Tmin10 commented 3 years ago

I tried it on my local machine but can't get any errors. I would like to patch my home assistant setup with this fix and check for a few days.

Tmin10 commented 3 years ago

Ok, updated aioasuswrt in HA setup. Let's wait for some time: image

Tmin10 commented 3 years ago

Wops, typo. I fixed it, let's wait for an issue again... image

Tmin10 commented 3 years ago

Much better. It's failed on timeout exception but next call was fine:

2021-02-07 00:26:50 ERROR (MainThread) [aioasuswrt.connection] Host timeout.
2021-02-07 00:26:50 WARNING (MainThread) [aioasuswrt.connection] Telnet disconnected.
2021-02-07 00:26:50 ERROR (MainThread) [homeassistant.components.asuswrt.sensor] Unexpected error fetching sensor data: 'NoneType' object has no attribute 'write'
Traceback (most recent call last):
File "/usr/src/homeassistant/homeassistant/helpers/update_coordinator.py", line 144, in async_refresh
self.data = await self._async_update_data()
File "/usr/src/homeassistant/homeassistant/helpers/update_coordinator.py", line 132, in _async_update_data
return await self.update_method()
File "/usr/src/homeassistant/homeassistant/components/asuswrt/sensor.py", line 114, in update_data
devices = await self._api.async_get_connected_devices()
File "/usr/local/lib/python3.8/site-packages/aioasuswrt/asuswrt.py", line 351, in async_get_connected_devices
dev = await self.async_get_wl()
File "/usr/local/lib/python3.8/site-packages/aioasuswrt/asuswrt.py", line 274, in async_get_wl
lines = await self.connection.async_run_command(_WL_CMD)
File "/usr/local/lib/python3.8/site-packages/aioasuswrt/connection.py", line 106, in async_run_command
self._writer.write((full_cmd + "\n").encode("ascii"))
AttributeError: 'NoneType' object has no attribute 'write'
JJdeVries commented 3 years ago

Okay, atleast we're on the right track. So after the AttributeError you get calls that succeed again?

The AttributeError is caused because the async_get_wl call already checked the connection, after which it's waiting on the acquire of the io_lock. It's in any way probably better to do the self.disconnect calls while the io_lock is still acquired (seeing as these calls change the reader and writer objects, which are protected by the io_lock). And we'll probably have to find a way to check the connection AFTER acquiring the lock.

I'm guessing that the simplest way would be to split the current self.connect method, and put everything after the lock acquiring in a separate private method. The async_run_command method should than check the connection after acquiring the lock, and if no connection is established, call the private connection function (which does not try to acquire the io_lock again).

Would you be able to put these changes in a pull request? Otherwise I'll try to do that somewhere this week.

Tmin10 commented 3 years ago

@JJdeVries I created PR with this changes. May you take a look? #65 If this is fine I can test it on my HA server for a while.

Tmin10 commented 3 years ago

@JJdeVries updated PR after you code review. Sorry for delay.

Tmin10 commented 3 years ago

It should be fixed in 1.3.3. Maybe we could close this issue then.

JJdeVries commented 3 years ago

agreed