networkupstools / nut

The Network UPS Tools repository. UPS management protocol Informational RFC 9271 published by IETF at https://www.rfc-editor.org/info/rfc9271 Please star NUT on GitHub, this helps with sponsorships!
https://networkupstools.org/
Other
1.94k stars 346 forks source link

Poll UPS failed - Data stale #1022

Open HyP3r- opened 3 years ago

HyP3r- commented 3 years ago

Hello I have a Bluewalker UPS Powerwalker VI 2200 LCD UPS. Unfortunately, always after some time comes the following error messages:

usbhid-ups[2387]: libusb_get_interrupt: could not claim interface 0: Device or resource busy usbhid-ups[2387]: libusb_get_report: error sending control message: No such device usbhid-ups[2387]: libusb_get_interrupt: could not claim interface 0: Device or resource busy usbhid-ups[2387]: libusb_get_report: error sending control message: No such device usbhid-ups[2387]: libusb_get_interrupt: error submitting URB: No such device usbhid-ups[2387]: libusb_get_interrupt: could not claim interface 0: Device or resource busy usbhid-ups[2387]: libusb_get_interrupt: error submitting URB: No such device usbhid-ups[2387]: libusb_get_interrupt: could not claim interface 0: Device or resource busy usbhid-ups[2387]: libusb_get_report: error sending control message: No such device usbhid-ups[2387]: libusb_get_interrupt: error submitting URB: No such device systemd[1]: /lib/systemd/system/nut-monitor.service:6: PIDFile= references path below legacy directory /var/run/, updating /var/run/nut/upsmon.pid → /run/nut/upsmon.pid; pleas systemd[1]: nut-monitor.service: Supervising process 2399 which is not our child. We'll most likely not notice when it exits. systemd[1]: /lib/systemd/system/nut-monitor.service:6: PIDFile= references path below legacy directory /var/run/, updating /var/run/nut/upsmon.pid → /run/nut/upsmon.pid; pleas systemd[1]: nut-monitor.service: Supervising process 2399 which is not our child. We'll most likely not notice when it exits. usbhid-ups[2387]: libusb_get_interrupt: error sending control message: Value too large for defined data type usbhid-ups[2387]: libusb_get_report: error sending control message: No such device usbhid-ups[2387]: libusb_get_interrupt: error submitting URB: No such device usbhid-ups[2387]: libusb_get_interrupt: error submitting URB: No such device usbhid-ups[2387]: libusb_get_interrupt: error submitting URB: No such device upsd[2391]: Data for UPS [myups] is stale - check driver upsmon[2399]: Poll UPS [myups@localhost] failed - Data stale upsmon[2399]: Communications with UPS myups@localhost lost upsmon[2399]: Poll UPS [myups@localhost] failed - Data stale

In this case the server has been running for 55 days. So after 55 days of operation now these error messages come. What could be the reason for this?

lsusb:

Bus 001 Device 050: ID 06da:ffff Phoenixtec Power Co., Ltd.

cat ups.conf:

[myups]
    driver = usbhid-ups
    port = auto
    vendorid = 06da
    desc = "myups"
    pollinterval = 15

upsd -v:

Network UPS Tools upsd 2.7.4
jimklimov commented 3 years ago

For the most part the messages indicate that the OS lost the USB device, not much that NUT can do to fix that part. Might be some power-saving disabling the chip, or something else frozen on the hub and a USB reset restarting every connection (either intentionally, or because some vendors cut corners and implement a port reset same as hub reset).

Guessing by your device name, containing "vm", I might add potentially some delays or flaws in virtualization/passthrough layer as a possible problem source.

On a busy machine (hypervisor with many guests, or other compute/IO/context-switch heavy systems) or on an under-powered machine (preempted VM just not on CPU) there might also be some exceeded hardcoded timeouts in any layer of the OS, drivers and software down to libusb used by NUT driver, perceiving a loss of device because last data exchange was a bit too long ago.

In that log I see the lost device reported all over April, but the driver only aborted once? I believe that means retries succeeded to find (maybe reinitialize) the USB device except once.

For context: which OS and version of NUT (packaged or custom built) is that happening with? If you do/can run a custom build, are you in position to check if native libusb-1.0 support from some of the competing branches in issue #300 would behave better?

HyP3r- commented 3 years ago

For the most part the messages indicate that the OS lost the USB device, not much that NUT can do to fix that part. Might be some power-saving disabling the chip, or something else frozen on the hub and a USB reset restarting every connection (either intentionally, or because some vendors cut corners and implement a port reset same as hub reset).

This is a Debian 10 which runs on a HP ProLiant DL360. I mention this because I can't imagine that such a server has energy saving mechanisms for the USB system. How can I find out if something like this happened at this moment?

Guessing by your device name, containing "vm", I might add potentially some delays or flaws in virtualization/passthrough layer as a possible problem source.

It is not a virtual machine but the hypervisor. As platform I use Debian 10 with Proxmox VE.

On a busy machine (hypervisor with many guests, or other compute/IO/context-switch heavy systems) or on an under-powered machine (preempted VM just not on CPU) there might also be some exceeded hardcoded timeouts in any layer of the OS, drivers and software down to libusb used by NUT driver, perceiving a loss of device because last data exchange was a bit too long ago.

As already described, the system actually runs a hypervisor. And the system is also relatively busy. But somehow I can't imagine that this is enough to hang up so much.

In that log I see the lost device reported all over April, but the driver only aborted once? I believe that means retries succeeded to find (maybe reinitialize) the USB device except once.

It seems so yes: the last few months the connection is repeatedly interrupted, only unfortunately today it did not work to restore the connection.

For context: which OS and version of NUT (packaged or custom built) is that happening with? If you do/can run a custom build, are you in position to check if native libusb-1.0 support from some of the competing branches in issue #300 would behave better?

I just tried a few times to compile the branches "libusb-1.0+0.1", "libusb-1.0" and "libusb-compat-1.0" but with no full success: The branches "libusb-1.0+0.1" and "libusb-1.0" I could compile but were unable to use them. I always get the following error message:

upsdrvctl[44892]: Failed to read pid from /var/state/ups/usbhid-ups1-myups.pid
upsdrvctl[44892]: Network UPS Tools - Generic HID driver 0.42 (2.7.4-420-g2999c95f)
upsdrvctl[44892]: USB communication driver (libusb 1.0) 0.02
upsdrvctl[44892]: No matching HID UPS found
upsdrvctl[44892]: Driver failed to start (exit status=1)
upsmon[18049]: Poll UPS [myups@localhost] failed - Driver not connected
jimklimov commented 3 years ago

That failure to start probably means that it probed available USB devices and did not find a supported vendor/product ID. Might be that as of those branches the IDs for your UPS were not known yet, which is unlikely at least if you were running last release before that, or that the device remains not seen on the OS side.

Try starting the driver with -DDDDD argument for debug with higher verbosity; it would mention IDs it saw in particular.

On Thu, Apr 22, 2021, 00:20 HyP3r @.***> wrote:

For the most part the messages indicate that the OS lost the USB device, not much that NUT can do to fix that part. Might be some power-saving disabling the chip, or something else frozen on the hub and a USB reset restarting every connection (either intentionally, or because some vendors cut corners and implement a port reset same as hub reset). This is a Debian 10 which runs on a HP ProLiant DL360. I mention this because I can't imagine that such a server has energy saving mechanisms for the USB system. How can I find out if something like this happened at this moment?

Guessing by your device name, containing "vm", I might add potentially some delays or flaws in virtualization/passthrough layer as a possible problem source. It is not a virtual machine but the hypervisor. As platform I use Debian 10 with Proxmox VE.

On a busy machine (hypervisor with many guests, or other compute/IO/context-switch heavy systems) or on an under-powered machine (preempted VM just not on CPU) there might also be some exceeded hardcoded timeouts in any layer of the OS, drivers and software down to libusb used by NUT driver, perceiving a loss of device because last data exchange was a bit too long ago. As already described, the system actually runs a hypervisor. And the system is also relatively busy. But somehow I can't imagine that this is enough to hang up so much.

In that log I see the lost device reported all over April, but the driver only aborted once? I believe that means retries succeeded to find (maybe reinitialize) the USB device except once. It seems so yes: the last few months the connection is repeatedly interrupted, only unfortunately today it did not work to restore the connection.

For context: which OS and version of NUT (packaged or custom built) is that happening with? If you do/can run a custom build, are you in position to check if native libusb-1.0 support from some of the competing branches in issue #300 https://github.com/networkupstools/nut/issues/300 would behave better? I just tried a few times to compile the branches "libusb-1.0+0.1", "libusb-1.0" and "libusb-compat-1.0" but with no full success: The branches "libusb-1.0+0.1" and "libusb-1.0" I could compile but were unable to use them. I always get the following error message:

upsdrvctl[44892]: Failed to read pid from /var/state/ups/usbhid-ups1-myups.pid upsdrvctl[44892]: Network UPS Tools - Generic HID driver 0.42 (2.7.4-420-g2999c95f) upsdrvctl[44892]: USB communication driver (libusb 1.0) 0.02 upsdrvctl[44892]: No matching HID UPS found upsdrvctl[44892]: Driver failed to start (exit status=1) upsmon[18049]: Poll UPS @.*** failed - Driver not connected

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/networkupstools/nut/issues/1022#issuecomment-824396176, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMPTFBGJEM4EYPAMYYYDFTTJ5FT3ANCNFSM43LALQWA .

HyP3r- commented 3 years ago

After I called configure Debian conform the library could be compiled and used. I will now observe the whole thing for a quarter. If it doesn't hang anymore, the ticket can be closed.

I will inform you in this issue.

jimklimov commented 3 years ago

thanks!

On Fri, Apr 23, 2021, 20:00 HyP3r @.***> wrote:

After I called configure Debian conform the library could be compiled and used. I will now observe the whole thing for a quarter. If it doesn't hang anymore, the ticket can be closed.

I will inform you in this issue.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/networkupstools/nut/issues/1022#issuecomment-825825532, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMPTFHCF5A2V3WEJGDDEHTTKGYTVANCNFSM43LALQWA .

graemev commented 6 months ago

OK, I hit this issue, a reboot "fixed" it but I gathered some info before the reboot , might help somebody, esp if the real fix is to raise a kernel defect. (BTW this happened about 10 days after a real power cut...that would have been annoying if it had failed):

The messages:

Feb 25 14:06:57 ybox usbhid-ups[727]: libusb_get_interrupt: could not claim interface 0: Device or resource busy                                                                                                              
Feb 25 14:07:07 ybox usbhid-ups[727]: libusb_get_interrupt: error sending control message: Protocol error                                                                                                                     
Feb 25 14:07:17 ybox usbhid-ups[727]: libusb_get_interrupt: error sending control message: Protocol error                                                                                                                     
... 33K of similar messages  

Stop and restart nut-server (debian bullseye)


root@ybox:/var/log# systemctl start  nut-server                                                                                                                                                                               
root@ybox:/var/log# systemctl status nut-server                                                                                                                                                                               
● nut-server.service - Network UPS Tools - power devices information server                                                                                                                                                   
     Loaded: loaded (/lib/systemd/system/nut-server.service; enabled; vendor preset: enabled)                                                                                                                                 
     Active: active (running) since Thu 2024-02-29 10:50:45 GMT; 38s ago                                                                                                                                                      
    Process: 1066411 ExecStart=/sbin/upsd (code=exited, status=0/SUCCESS)                                                                                                                                                     
   Main PID: 1066412 (upsd)                                                                                                                                                                                                   
      Tasks: 1 (limit: 4476)                                                                                                                                                                                                  
     Memory: 772.0K                                                                                                                                                                                                           
        CPU: 33ms                                                                                                                                                                                                             
     CGroup: /system.slice/nut-server.service                                                                                                                                                                                 
             └─1066412 /lib/nut/upsd                                      

Feb 29 10:50:45 ybox upsd[1066411]: listening on 127.0.0.1 port 3493                                                                                                                                                          
Feb 29 10:50:45 ybox upsd[1066411]: listening on 10.117.0.152 port 3493                                                                                                                                                       
Feb 29 10:50:45 ybox upsd[1066411]: listening on 127.0.0.1 port 3493                                                                                                                                                          
Feb 29 10:50:45 ybox upsd[1066411]: Can't connect to UPS [qnapups] (usbhid-ups-qnapups): No such file or directory                                                                                                            
Feb 29 10:50:45 ybox upsd[1066411]: Can't connect to UPS [qnapups] (usbhid-ups-qnapups): No such file or directory                                                                                                            
Feb 29 10:50:45 ybox upsd[1066411]: /etc/nut/upsd.users is world readable                                                                                                                                                     
Feb 29 10:50:45 ybox upsd[1066411]: /etc/nut/upsd.users is world readable                                                                                                                                                     
Feb 29 10:50:45 ybox upsd[1066412]: Startup successful                                                                                                                                                                        
Feb 29 10:50:45 ybox systemd[1]: Started Network UPS Tools - power devices information server.                                                                                                                                
Feb 29 10:50:50 ybox upsd[1066412]: User admin@127.0.0.1 logged into UPS [qnapups]  

The driver failed to start:

root@ybox:/var/log# /lib/nut/usbhid-ups -DDD -a qnapups -u nut   
 ....
0.032911        Checking device (051D/0002) (001/002)                                                                                                                                                                         
   0.034470     - VendorID: 051d                                                                                                                                                                                              
   0.034502     - ProductID: 0002                                                                                                                                                                                             
   0.034516     - Manufacturer: unknown                                                                                                                                                                                       
   0.034529     - Product: unknown                                                                                                                                                                                            
   0.034542     - Serial Number: unknown                                                                                                                                                                                      
   0.034554     - Bus: 001                                                                                                                                                                                                    
   0.034566     - Device release number: 0106                                                                                                                                                                                 
   0.034586     Trying to match device                                                                                                                                                                                        
   0.034605     device->Product is NULL so it is not possible to determine whether to activate max_report_size workaround                                                                                                     
   0.034641     Device matches                                                                                                                                                                                                
   0.034703     nut_usb_set_altinterface: skipped usb_set_altinterface(udev, 0)                                                                                                                                               
   0.035360     Unable to get HID descriptor (error sending control message: Protocol error)                                                                                                                                  
   0.035391     HID descriptor length (method 1) -1                                                                                                                                                                           
   0.035419     HID descriptor, method 2: (9 bytes) => 09 21 10 01 21 01 22 05 04                                                                                                                                             
   0.035440     HID descriptor length (method 2) 1029                                                                                                                                                                         
   0.035502     HID descriptor length 1029                                                                                                                                                                                    
   0.039396     Unable to get Report descriptor: Protocol error                                                                                                                                                               
...

So it thinks Product is NULL ...

root@ybox:/var/log# lsusb -d 051d:                                                                                                                                                                                            
Bus 001 Device 002: ID 051d:0002 American Power Conversion Uninterruptible Power Supply                                                                                                                                       

Bus 001 Device 002: ID 051d:0002 American Power Conversion Uninterruptible Power Supply                                                                                                                                       
Device Descriptor:                                                                                                                                                                                                            
  bLength                18                                                                                                                                                                                                   
  bDescriptorType         1                                                                                                                                                                                                   
  bcdUSB               1.10                                                                                                                                                                                                   
  bDeviceClass            0                                                                                                                                                                                                   
  bDeviceSubClass         0                                                                                                                                                                                                   
  bDeviceProtocol         0                                                                                                                                                                                                   
  bMaxPacketSize0         8                                                                                                                                                                                                   
  idVendor           0x051d American Power Conversion                                                                                                                                                                         
  idProduct          0x0002 Uninterruptible Power Supply                                                                                                                                                                      
  bcdDevice            1.06                                                                                                                                                                                                   
  iManufacturer           3 American Power Conversion                                                                                                                                                                         
  iProduct                1 Back-UPS XS 700U   FW:924.Z5 .I USB FW:Z5                                                                                                                                                         
  iSerial                 2 3B2041X34361                                                                                                                                                                                      
  bNumConfigurations      1                                                                                                                                                                                                   
  Configuration Descriptor:                                                                                                                                                                                                   
    bLength                 9                                                                                                                                                                                                 
    bDescriptorType         2                                                                                                                                                                                                 
    wTotalLength       0x0022                                                                                                                                                                                                 
    bNumInterfaces          1                                                                                                                                                                                                 
    bConfigurationValue     1                                                                                                                                                                                                 
    iConfiguration          0                                                                                                                                                                                                 
    bmAttributes         0xe0                                                                                                                                                                                                 
      Self Powered                                                                                                                                                                                                            
      Remote Wakeup                                                                                                                                                                                                           
    MaxPower               24mA                                                                                                                                                                                               
    Interface Descriptor:                                                                                                                                                                                                     
      bLength                 9                                                                                                                                                                                               
      bDescriptorType         4                                                                                                                                                                                               
      bInterfaceNumber        0                                                                                                                                                                                               
      bAlternateSetting       0                                                                                                                                                                                               
      bNumEndpoints           1                                                                                                                                                                                               
      bInterfaceClass         3 Human Interface Device                                                                                                                                                                        
      bInterfaceSubClass      0                                                                                                                                                                                               
      bInterfaceProtocol      0                                                                                                                                                                                               
      iInterface              0                                                                                                                                                                                               
        HID Device Descriptor:                                                                                                                                                                                                
          bLength                 9                                                                                                                                                                                           
          bDescriptorType        33                                                                                                                                                                                           
          bcdHID               1.10                                                                                                                                                                                           
          bCountryCode           33 US                                                                                                                                                                                        
          bNumDescriptors         1                                                                                                                                                                                           
          bDescriptorType        34 Report                                                                                                                                                                                    
          wDescriptorLength    1029                                                                                                                                                                                           
          Report Descriptor: (length is -1)                                                                                                                                                                                   
      Endpoint Descriptor:                                                                                                                                                                                                    
        bLength                 7                                                                                                                                                                                             
        bDescriptorType         5                                                                                                                                                                                             
        bEndpointAddress     0x81  EP 1 IN                                                                                                                                                                                    
        bmAttributes            3                                                                                                                                                                                             
          Transfer Type            Interrupt                                                                                                                                                                                  
          Synch Type               None                                                                                                                                                                                       
          Usage Type               Data                                                                                                                                                                                       
        wMaxPacketSize     0x0006  1x 6 bytes                                                                                                                                                                                 
        bInterval              10                                                                                                                                                                                             
can't get debug descriptor: Resource temporarily unavailable                                                                                                                                                                  
cannot read device status, Resource temporarily unavailable (11)   

(BTW that last line may be key)

Looking at:

root@ybox:/var/log# cat /sys/bus/usb/devices/1-1/product                                                                                                                                                                      
Back-UPS XS 700U   FW:924.Z5 .I USB FW:Z5                                                                                                                                                                                     

It appears to be there.

(also did this as user NUT also OK)

Will reboot ..

Following reboot ...all seemed to come up OK (UPS WAS NOT power cycled)

I noted the following diffs in the output of lsusb:

Before Reboot:                                                                                                                                                                                                                
          Report Descriptor: (length is -1)                                                                                                                                                                                   
          ...                                                                                                                                                                                                                 
          can't get debug descriptor: Resource temporarily unavailable                                                                                                                                                        
          cannot read device status, Resource temporarily unavailable (11)                                                                                                                                            

After Reboot:                                                                                                                                                                                                                 
         Report Descriptors:                                                                                                                                                                                                  
           UNAVAILABLE                                                                                                                                                                                                  

           can't get debug descriptor: Resource temporarily unavailable      

           **Device Status:     0x0000  **                                                                                                                                                                                       
             (Bus Powered)