networkupstools / nut

The Network UPS Tools repository. UPS management protocol Informational RFC 9271 published by IETF at https://www.rfc-editor.org/info/rfc9271 Please star NUT on GitHub, this helps with sponsorships!
https://networkupstools.org/
Other
1.92k stars 345 forks source link

usbhid-ups becomes unavailable every day #1983

Open martin3000 opened 1 year ago

martin3000 commented 1 year ago

I have a powerwalker Vi 850 SH with the usbhid-ups driver on ubuntu, nut 2.7.4. Every day, "upsc ups1" hangs, because the ups driver is unavailable. I can fix it with "sudo upsdrvctl start", but the next day it is unavailable again. I don't see anything useful in the syslog.

It seems that the driver crashes or hangs. How can I test this?

driver.version: 2.7.4 driver.version.data: MGE HID 1.40 driver.version.internal: 0.41

jimklimov commented 1 year ago

Hello, for investigation, you would need to bump debug verbosity for the driver (maybe even to level 6 if your storage permits) to see how it ticks and the last words when it does not. On older NUT releases with systemd services it is a bit more complicated than with init-scripts or with NUT 2.8.0+, see https://github.com/networkupstools/nut/wiki/Changing-NUT-daemon-debug-verbosity for details.

In my older experience there was a device that went AWOL randomly - sometimes after a week, sometimes after 20 minutes. The libusb layer just hung talking to it, I've never reliably tracked down why it fails and had to craft https://github.com/networkupstools/nut/blob/master/scripts/Solaris/reset-ups-usb-solaris.sh.sample to soft-replug the device and restart the drivers. A somewhat similarly harsh approach is documented at https://github.com/networkupstools/nut/tree/master/scripts/usb_resetter for Linux.

Drivers in NUT 2.8.0+ should be a bit more reliable at detecting their own hangs or commanding their older self to die (and so release the UPS device filesystem node) when starting a new instance.

martin3000 commented 1 year ago

Thank you for looking into it. From the latest ubuntu packages, I extracted usbhid-ups 2.8 and I am using it now with nut 2.7.4 But the problem remains the same.

"journalctl -u nut-driver" shows:

Jul 10 13:04:26 nuc usbhid-ups-2.8[1261]: nut_libusb_get_report: Input/Output Error
Jul 10 13:04:26 nuc usbhid-ups-2.8[1261]: nut_libusb_get_report: Input/Output Error
Jul 10 13:04:26 nuc usbhid-ups-2.8[1261]: nut_libusb_get_report: Input/Output Error
Jul 10 13:04:26 nuc usbhid-ups-2.8[1261]: nut_libusb_get_report: Input/Output Error
Jul 10 13:04:28 nuc usbhid-ups-2.8[1261]: nut_libusb_get_interrupt: Input/Output Error
Jul 10 13:04:29 nuc usbhid-ups-2.8[1261]: libusb1: Could not open any HID devices: insufficient permissions on everything

Doing "sudo /lib/nut/usbhid-ups-2.8 -a ups1 -DDDDD" I get:

   0.000000 [D1] debug level is '5'
   0.003820 [D5] send_to_all: SETINFO device.type "ups"
   0.003842 [D2] Initializing an USB-connected UPS with library libusb-1.0.25 (API: 0x1000109) (NUT subdriver name='USB communication driver (libusb 1.0)' ver='0.43')
   0.003855 [D1] upsdrv_initups (non-SHUT)...
   0.016038 [D2] Checking device 1 of 11 (152D/0579)
   0.016080 [D1] Failed to open device (152D/0579), skipping: Access denied (insufficient permissions)
   0.016101 [D2] Checking device 2 of 11 (2109/0813)
   0.016128 [D1] Failed to open device (2109/0813), skipping: Access denied (insufficient permissions)
   0.016143 [D2] Checking device 3 of 11 (1D6B/0003)
   0.016166 [D1] Failed to open device (1D6B/0003), skipping: Access denied (insufficient permissions)
   0.016179 [D2] Checking device 4 of 11 (8087/0AAA)
   0.016206 [D1] Failed to open device (8087/0AAA), skipping: Access denied (insufficient permissions)
   0.016223 [D2] Checking device 5 of 11 (046D/C517)
   0.016252 [D1] Failed to open device (046D/C517), skipping: Access denied (insufficient permissions)
   0.016268 [D2] Checking device 6 of 11 (06DA/FFFF)
   0.016961 [D1] nut_libusb_open get iManufacturer failed, retrying...
   0.017525 [D1] nut_libusb_open get iManufacturer failed, retrying...
   0.018028 [D1] nut_libusb_open get iManufacturer failed, retrying...
   0.018524 [D1] nut_libusb_open get iProduct failed, retrying...
   0.019145 [D1] nut_libusb_open get iProduct failed, retrying...
   0.019707 [D1] nut_libusb_open get iProduct failed, retrying...
   0.020187 [D1] nut_libusb_open get iSerialNumber failed, retrying...
   0.020756 [D1] nut_libusb_open get iSerialNumber failed, retrying...
   0.021228 [D1] nut_libusb_open get iSerialNumber failed, retrying...
   0.021242 [D2] - VendorID: 06da
   0.021252 [D2] - ProductID: ffff
   0.021266 [D2] - Manufacturer: unknown
   0.021277 [D2] - Product: unknown
   0.021289 [D2] - Serial Number: unknown
   0.021302 [D2] - Bus: 001
   0.021311 [D2] - Device: unknown
   0.021329 [D2] - Device release number: 0003
   0.021339 [D2] Trying to match device
   0.021351 [D2] match_function_subdriver (non-SHUT mode): matching a device...
   0.021372 [D3] match_function_regex: matching a device...
   0.021393 [D2] Device matches
   0.021412 [D2] Reading first configuration descriptor
   0.021443 [D3] libusb_kernel_driver_active() returned 0
   0.021484 [D2] Claimed interface 0 successfully
   0.021505 [D3] nut_usb_set_altinterface: skipped libusb_set_interface_alt_setting(udev, 0, 0)
   0.021987 [D2] Unable to get HID descriptor (Input/Output Error)
   0.022008 [D3] HID descriptor length (method 1) -1
   0.022028 [D4] i=0, extra[i]=09, extra[i+1]=21
   0.022044 [D3] HID descriptor, method 2: (9 bytes) => 09 21 00 01 00 01 22 ef 01
   0.022075 [D3] HID descriptor length (method 2) 495
   0.022093 [D2] HID descriptor length 495
   0.022726 [D2] Unable to get Report descriptor: Resource temporarily unavailable
   0.022774 [D2] Checking device 7 of 11 (0403/6015)
   0.022811 [D1] Failed to open device (0403/6015), skipping: Access denied (insufficient permissions)
   0.022823 [D2] Checking device 8 of 11 (10C4/EA60)
   0.022853 [D1] Failed to open device (10C4/EA60), skipping: Access denied (insufficient permissions)
   0.022874 [D2] Checking device 9 of 11 (0C45/51F4)
   0.022905 [D1] Failed to open device (0C45/51F4), skipping: Access denied (insufficient permissions)
   0.022922 [D2] Checking device 10 of 11 (2109/2813)
   0.022947 [D1] Failed to open device (2109/2813), skipping: Access denied (insufficient permissions)
   0.022964 [D2] Checking device 11 of 11 (1D6B/0002)
   0.022991 [D1] Failed to open device (1D6B/0002), skipping: Access denied (insufficient permissions)
   0.023005 [D2] libusb1: No appropriate HID device found
   0.023023 libusb1: Could not open any HID devices: insufficient permissions on everything
   0.023043 No matching HID UPS found

So the problem is "Unable to get Report descriptor: Resource temporarily unavailable"

martin3000 commented 1 year ago

After unplugging the UPS:

   0.000000 [D1] debug level is '5'
   0.003315 [D5] send_to_all: SETINFO device.type "ups"
   0.003335 [D2] Initializing an USB-connected UPS with library libusb-1.0.25 (API: 0x1000109) (NUT subdriver name='USB communication driver (libusb 1.0)' ver='0.43')
   0.003346 [D1] upsdrv_initups (non-SHUT)...
   0.013412 [D2] Checking device 1 of 11 (152D/0579)
   0.013445 [D1] Failed to open device (152D/0579), skipping: Access denied (insufficient permissions)
   0.013459 [D2] Checking device 2 of 11 (2109/0813)
   0.013480 [D1] Failed to open device (2109/0813), skipping: Access denied (insufficient permissions)
   0.013492 [D2] Checking device 3 of 11 (1D6B/0003)
   0.013511 [D1] Failed to open device (1D6B/0003), skipping: Access denied (insufficient permissions)
   0.013521 [D2] Checking device 4 of 11 (8087/0AAA)
   0.013544 [D1] Failed to open device (8087/0AAA), skipping: Access denied (insufficient permissions)
   0.013556 [D2] Checking device 5 of 11 (046D/C517)
   0.013577 [D1] Failed to open device (046D/C517), skipping: Access denied (insufficient permissions)
   0.013607 [D2] Checking device 6 of 11 (06DA/FFFF)
   0.018950 [D2] - VendorID: 06da
   0.018987 [D2] - ProductID: ffff
   0.018997 [D2] - Manufacturer: PPC
   0.019006 [D2] - Product: Offline UPS
   0.019018 [D2] - Serial Number: 000000000   
   0.019030 [D2] - Bus: 001
   0.019040 [D2] - Device: unknown
   0.019054 [D2] - Device release number: 0003
   0.019067 [D2] Trying to match device
   0.019081 [D2] match_function_subdriver (non-SHUT mode): matching a device...
   0.019100 [D3] match_function_regex: matching a device...
   0.019114 [D2] Device matches
   0.019125 [D2] Reading first configuration descriptor
   0.019153 [D3] libusb_kernel_driver_active() returned 1 (driver active)
   0.019166 [D2] successfully set kernel driver auto-detach flag
   0.019836 [D2] Claimed interface 0 successfully
   0.019850 [D3] nut_usb_set_altinterface: skipped libusb_set_interface_alt_setting(udev, 0, 0)
   0.020571 [D3] HID descriptor, method 1: (9 bytes) => 09 21 00 01 00 01 22 ef 01
   0.020586 [D3] HID descriptor length (method 1) 495
   0.020597 [D4] i=0, extra[i]=09, extra[i+1]=21
   0.020624 [D3] HID descriptor, method 2: (9 bytes) => 09 21 00 01 00 01 22 ef 01
   0.020748 [D3] HID descriptor length (method 2) 495
   0.020781 [D2] HID descriptor length 495
   0.036958 [D2] Report Descriptor size = 495
jimklimov commented 1 year ago

Just in case: while it was "Unable to get Report descriptor: Resource temporarily unavailable" - was the other driver instance surely stopped?

Also, a bit surprised that it worked for you as /lib/nut/usbhid-ups-2.8 -- I think many drivers check the base filename to make sure the driver=... from config matches. Wondering now if it looks at just the beginning of the string, not all of it (e.g. to ignore possible extensions), and if it should...

martin3000 commented 1 year ago

I renamed the 2.8 version to usbhid-ups-2.8. And I changed the .conf file. Currently it is running with debug output going to a file. It seems that the problem with the 2.8 driver is not the same as with the 2.7.4 driver.

jimklimov commented 1 year ago

It seems that the problem with the 2.8 driver is not the same as with the 2.7.4 driver.

Do you mean the "Resource unavailable"? Probably it is the root-cause of problem with 2.7.4 as well. In 2.8.0 if something like this happens during run-time, the driver should be able to decide that it wants to close and re-open the device (almost like restarting the driver), which may succeed or not; the older versions would have tried talking to the effectively absent device.

It is however up to the device itself (and the OS) whether it goes AWOL (e.g. some CPS-related UPS chips allegedly tend to go into power-saving in the middle of monitoring) and how it recovers (does udev or similar system change ownership to nut:nut so the driver may access it again)...

martin3000 commented 1 year ago

With 2.7.4 I could do a "sudo upsdrvctl start" and it was working again. This did not work with 2.8

jimklimov commented 1 year ago

Does it report any issues when "not working" this way?

By the way, note that is your NUT package is set up to use systemd for the daemons (including drivers), the manually driven upsdrvctl may be problematic:

jimklimov commented 1 year ago

Also note that the 2.8.0+ version of the driver should support a debug_min setting in ups.conf (maybe the 2.7.4 upsd would dislike it though...) If it works, setting up boosted debugging would be easier. The older upsdrvctl would hide most of it though when forking the driver.

martin3000 commented 1 year ago

note that is your NUT package is set up to use systemd for the daemons (including drivers), the manually driven upsdrvctl may be problematic

So the right way would be a "sudo systemctl restart nut-driver.service"? I don't think that systemd tries to restart the service. Currently the service is in a failed state and I'm running the driver stand-alone for debugging purpose.

jimklimov commented 1 year ago

Normally yes; with NUT 2.8.0 it is even better fine-grained with a service unit instance separate for each driver like nut-driver@UPSNAME.service (caveats apply, maybe not exactly UPSNAME). With current master it also taps into systemd-notify mechanism (sort of a software-watchdog) so a frozen driver is detected as unhealthy and restarted.

If in your case the service is stopped, then command-line tests should be OK in terms of (non-)conflicts.

Next question: do I get it right that with your usbhid-ups-2.8, there is some problem using upsdrvctl start - but no problem running the driver binary directly (as in earlier screenshots)? Did you try renaming away the packaged binary and using the newer 2.8 one as usbhid-ups verbatim - does any behavior change then?

martin3000 commented 1 year ago

With the 2.7.4 driver, "upsdrvctl start" always worked, with the 2.8 driver it did not work. But give me some more days to test it because it normally runs nearly a day before it fails.

martin3000 commented 1 year ago

For debugging purposes, I started the driver with "sudo /lib/nut/usbhid-ups-2.8 -a ups1 -DDDDD" and it is now running fine for 3 days. "nut-driver.service" ist not running, so currently I'm not using upsdrvctl. I wait 2 more days and then I'm trying it to run normally thru the nut-driver.service. Maybe there is a difference.

jimklimov commented 1 year ago

Thanks for the update!

martin3000 commented 1 year ago

So, now it failed:

275883.763669   [D4] ==================================================
275883.763683   [D4] = device has been disconnected, try to reconnect =
275883.763698   [D4] ==================================================
275883.763732   [D2] Checking device 1 of 11 (152D/0579)
275883.763777   [D1] Failed to open device (152D/0579), skipping: Access denied (insufficient permissions)
275883.763793   [D2] Checking device 2 of 11 (2109/0813)
275883.763816   [D1] Failed to open device (2109/0813), skipping: Access denied (insufficient permissions)
275883.763831   [D2] Checking device 3 of 11 (1D6B/0003)
275883.763856   [D1] Failed to open device (1D6B/0003), skipping: Access denied (insufficient permissions)
275883.763871   [D2] Checking device 4 of 11 (8087/0AAA)
275883.763897   [D1] Failed to open device (8087/0AAA), skipping: Access denied (insufficient permissions)
275883.763912   [D2] Checking device 5 of 11 (046D/C517)
275883.763935   [D1] Failed to open device (046D/C517), skipping: Access denied (insufficient permissions)
275883.763950   [D2] Checking device 6 of 11 (06DA/FFFF)
275883.764626   [D1] nut_libusb_open get iManufacturer failed, retrying...
275883.765099   [D1] nut_libusb_open get iManufacturer failed, retrying...
275883.765707   [D1] nut_libusb_open get iManufacturer failed, retrying...
275883.766173   [D1] nut_libusb_open get iProduct failed, retrying...
275883.766707   [D1] nut_libusb_open get iProduct failed, retrying...
275883.767173   [D1] nut_libusb_open get iProduct failed, retrying...
275883.767747   [D1] nut_libusb_open get iSerialNumber failed, retrying...
275883.768234   [D1] nut_libusb_open get iSerialNumber failed, retrying...
275883.768824   [D1] nut_libusb_open get iSerialNumber failed, retrying...
275883.768841   [D2] - VendorID: 06da
275883.768856   [D2] - ProductID: ffff
275883.768872   [D2] - Manufacturer: unknown
275883.768887   [D2] - Product: unknown
275883.768903   [D2] - Serial Number: unknown
275883.768918   [D2] - Bus: 001
275883.768933   [D2] - Device: unknown
275883.768949   [D2] - Device release number: 0003
275883.768964   [D2] Trying to match device
275883.768980   [D2] match_function_subdriver (non-SHUT mode): matching a device...
275883.769002   [D3] match_function_regex: matching a device...
275883.769020   [D3] match_function_exact: matching a device...
275883.769037   [D2] match_function_exact: failed match of Vendor: (null) != PPC
275883.769052   [D2] Device does not match - skipping
275883.769080   [D2] Checking device 7 of 11 (0403/6015)
275883.769111   [D1] Failed to open device (0403/6015), skipping: Access denied (insufficient permissions)
275883.769127   [D2] Checking device 8 of 11 (10C4/EA60)
275883.769152   [D1] Failed to open device (10C4/EA60), skipping: Access denied (insufficient permissions)
275883.769167   [D2] Checking device 9 of 11 (0C45/51F4)
275883.769190   [D1] Failed to open device (0C45/51F4), skipping: Access denied (insufficient permissions)
275883.769205   [D2] Checking device 10 of 11 (2109/2813)
275883.769227   [D1] Failed to open device (2109/2813), skipping: Access denied (insufficient permissions)
275883.769243   [D2] Checking device 11 of 11 (1D6B/0002)
275883.769266   [D1] Failed to open device (1D6B/0002), skipping: Access denied (insufficient permissions)
275883.769281   [D2] libusb1: No appropriate HID device found
275883.769296   libusb1: Could not open any HID devices: insufficient permissions on everything

A disconnect already happend yesterday, but a reconnect was possible. Now the reconnect does not work and the problem seems to be that nut_libusb_open cannot get the device properties. lsusb still shows the USB device.

martin3000 commented 1 year ago

lsusb -v for the UPS while usbhid-ups is not working:

Bus 001 Device 013: ID 06da:ffff Phoenixtec Power Co., Ltd Offline UPS
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               1.10
  bDeviceClass            0 
  bDeviceSubClass         0 
  bDeviceProtocol         0 
  bMaxPacketSize0         8
  idVendor           0x06da Phoenixtec Power Co., Ltd
  idProduct          0xffff 
  bcdDevice            0.03
  iManufacturer           1 PPC
  iProduct                2 Offline UPS
  iSerial                 4 000000000   
  bNumConfigurations      1
  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength       0x0022
    bNumInterfaces          1
    bConfigurationValue     1
    iConfiguration          0 
    bmAttributes         0xa0
      (Bus Powered)
      Remote Wakeup
    MaxPower              100mA
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       0
      bNumEndpoints           1
      bInterfaceClass         3 Human Interface Device
      bInterfaceSubClass      0 
      bInterfaceProtocol      0 
      iInterface              0 
        HID Device Descriptor:
          bLength                 9
          bDescriptorType        33
          bcdHID               1.00
          bCountryCode            0 Not supported
          bNumDescriptors         1
          bDescriptorType        34 Report
          wDescriptorLength     495
         Report Descriptors: 
           ** UNAVAILABLE **
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0008  1x 8 bytes
        bInterval             100

It seems possible to retrieve iManufacturer and iProduct.

martin3000 commented 1 year ago

"sudo usbhid-dump -s 1:13" gives

001:013:000:Failed to retrieve report descriptor: Input/Output Error

After plugging the UPS USB out and in it works.

jimklimov commented 1 year ago

Curious... Haven't used that tool myself TBH, but it seems to imply the problem is systemic rather than NUT or its interaction with libusb. Maybe the USB chip on UPS side hangs and something like usb_resetter mentioned above is in order.

martin3000 commented 1 year ago

I'm testing again with the 2.7.4 usbhid-ups driver. I have running it standalone in debug mode. When it stopped working, it was still running (but hanging). Nothing special in the logs. I just cancelled it and started it again. I did not have to unplug the usb. So that is not the same as with the 2.8 driver.

jimklimov commented 1 year ago

Out of curiosity: could it be that 2.8.0 and 2.7.4 somehow differently initialize or release the device?

How would it behave when 2.7.4 was running, stopped working, and then you tried starting 2.8.0? And vice versa?

martin3000 commented 1 year ago

The 2.7.4 driver stops working nearly every day, but this can be fixed with "sudo upsdrvctl start". I never had to unplug the usb cable (which was necessary with the 2.8 driver). So the 2.7.4 driver does not work because it hangs, and the 2.8 driver does not work because after a few days, the usb port stops working.

martin3000 commented 1 year ago

After usbhid-ups 2.7.4 stopped every day, I switched again to version 2.8. After it stopped, the usb port was somehow broken and a restart of the driver was not possible. But usb_resetter.py --reset-device -device 06da:ffff did help. The usb port was repaired and I could run usbhid-ups 2.8 again.

jimklimov commented 1 year ago

Thanks for the confirmation, that's relatively good news :}