numat / tripplite

Python USB HID interface to Tripplite UPS battery backups.
GNU General Public License v2.0
31 stars 12 forks source link

Help with handling USB Device ID changes in Long Running Processes #8

Closed cooperlees closed 3 years ago

cooperlees commented 3 years ago

I run the prometheus exporter 24/7 and it has been running flawlessly until a recent hardware change. On my Protectli FW1 hardware I had no issues with how code is, but I recently upgraded to a Supermicro 1 RU server and the USB device ID seems to periodically change. That is, when I run lsusb the Device int changes:

Bus 003 Device 098: ID 09ae:2012 Tripp Lite Tripp Lite UPS

This can take sometimes minutes, sometimes hours after starting the collector.

1) Should this change if no USB hardware is changed (it could be virtual hardware maybe?)?

I've made some attempts on a fork to handle this, but my hardware / USB n00b-ness has resulted in a lot of guessing. My attempt can be seen here: https://github.com/cooperlees/tripplite/commit/master

Any tips would be appreciated. The reason I think this is doable is as soon as I systemctl restart tripplite_exporter everything works again for a random amount of time.

patrickfuller commented 3 years ago

@cooperlees I'm also a USB n00b but I'll give you what help I can!

I know that USB assigns different path addresses on each connection, and each call to hid.enumerate. Your new server may be occasionally reconnecting the USB devices.

What I'd try first would be to functionalize the battery_paths logic. This should be easy enough with one battery. If the device no longer reads, then disconnect, get/refresh_battery_paths(), and then use the first element in that list again.

I have absolutely no idea if/how this would expand to support multiple batteries. Because of that, I'd be okay accepting a PR (assuming this approach works) if we add specific error handling, e.g.

try:
    read()
catch IOError: # guessing IOError
    if len(battery_paths) == 1:
        close()
        global battery_paths
        battery_paths = get_battery_paths()
        open()
    else:
        raise

Something like that could fix your use case without breaking multi-battery systems.

Let me know how it goes!

cooperlees commented 3 years ago

I don't think the hid.enumerate() library gets the new device ID so I'm fighting a loosing battle here. Going to giveup and just write a watchdog to bounce it when it dies the 1 time every day or so.