numat / tripplite

Python USB HID interface to Tripplite UPS battery backups.
GNU General Public License v2.0
31 stars 12 forks source link

Collection crashes randomly #9

Open duhruh opened 3 years ago

duhruh commented 3 years ago

Sorry, not sure if this project is still maintained, maybe you could just give me some pointers and I could patch it. I'm running into this error pretty frequently which requires a restart to mitigate, but then will happen again.

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/wsgiref/handlers.py", line 137, in run
    self.result = application(self.environ, self.start_response)
  File "/usr/local/lib/python3.9/site-packages/prometheus_client/exposition.py", line 123, in prometheus_app
    status, header, output = _bake_output(registry, accept_header, params)
  File "/usr/local/lib/python3.9/site-packages/prometheus_client/exposition.py", line 105, in _bake_output
    output = encoder(registry)
  File "/usr/local/lib/python3.9/site-packages/prometheus_client/openmetrics/exposition.py", line 14, in generate_latest
    for metric in registry.collect():
  File "/usr/local/lib/python3.9/site-packages/prometheus_client/registry.py", line 83, in collect
    for metric in collector.collect():
  File "/usr/local/lib/python3.9/site-packages/tripplite/prometheus.py", line 57, in collect
    ups_data = self.get_data()
  File "/usr/local/lib/python3.9/site-packages/tripplite/collectors.py", line 38, in get_data
    ups_data = self.battery.get()
  File "/usr/local/lib/python3.9/site-packages/tripplite/driver.py", line 127, in get
    output[category][subcategory] = self._read(options)
  File "/usr/local/lib/python3.9/site-packages/tripplite/driver.py", line 137, in _read
    report = self.device.get_feature_report(options['address'],
  File "hid.pyx", line 191, in hid.device.get_feature_report
ValueError: not open
patrickfuller commented 3 years ago

@duhruh I haven't really been maintaining this project, but it's been in continuous operation for me with 3 batteries for a few years now. It's a dead simple driver which I liked over other options.

I've been okay with TrippLites to date, but my experience writing drivers for vendor equipment is that some are more temperamental than others for no apparent reason.

Regarding this, are you using the long polling sample code?

state = None

def read_batteries(check_period=5):
    """Read battery and reopen in error. Use for long polling."""
    battery = Battery()
    battery.open()
    while True:
        time.sleep(check_period)
        try:
            state = battery.get()
        except OSError:
            logging.exception(f"Could not read battery {battery}.")
            battery.close()
            battery.open()

This should be able to handle standard disconnects if the device isn't too finicky.

EDIT Looks like you're running the prometheus exporter. If you're running this, you can try copy-pasting this reconnect logic into collectors.py. Sorry I can't help more!

cooperlees commented 1 year ago

FWIW - I hit this and made a watchdog to just restart the service. It seems the device USB ID changes sometimes confusing the python or something along the lines. I've never been able to truly root cause and fix. Something is being cached I feel and as the prometheus exporter is long running the cached data becomes stale. But I haven't debugged it in a long time (years) but I remember the root cause fix being difficult for multiple reasons.