Open HyP3r- opened 3 years ago
For the most part the messages indicate that the OS lost the USB device, not much that NUT can do to fix that part. Might be some power-saving disabling the chip, or something else frozen on the hub and a USB reset restarting every connection (either intentionally, or because some vendors cut corners and implement a port reset same as hub reset).
Guessing by your device name, containing "vm", I might add potentially some delays or flaws in virtualization/passthrough layer as a possible problem source.
On a busy machine (hypervisor with many guests, or other compute/IO/context-switch heavy systems) or on an under-powered machine (preempted VM just not on CPU) there might also be some exceeded hardcoded timeouts in any layer of the OS, drivers and software down to libusb used by NUT driver, perceiving a loss of device because last data exchange was a bit too long ago.
In that log I see the lost device reported all over April, but the driver only aborted once? I believe that means retries succeeded to find (maybe reinitialize) the USB device except once.
For context: which OS and version of NUT (packaged or custom built) is that happening with? If you do/can run a custom build, are you in position to check if native libusb-1.0 support from some of the competing branches in issue #300 would behave better?
For the most part the messages indicate that the OS lost the USB device, not much that NUT can do to fix that part. Might be some power-saving disabling the chip, or something else frozen on the hub and a USB reset restarting every connection (either intentionally, or because some vendors cut corners and implement a port reset same as hub reset).
This is a Debian 10 which runs on a HP ProLiant DL360. I mention this because I can't imagine that such a server has energy saving mechanisms for the USB system. How can I find out if something like this happened at this moment?
Guessing by your device name, containing "vm", I might add potentially some delays or flaws in virtualization/passthrough layer as a possible problem source.
It is not a virtual machine but the hypervisor. As platform I use Debian 10 with Proxmox VE.
On a busy machine (hypervisor with many guests, or other compute/IO/context-switch heavy systems) or on an under-powered machine (preempted VM just not on CPU) there might also be some exceeded hardcoded timeouts in any layer of the OS, drivers and software down to libusb used by NUT driver, perceiving a loss of device because last data exchange was a bit too long ago.
As already described, the system actually runs a hypervisor. And the system is also relatively busy. But somehow I can't imagine that this is enough to hang up so much.
In that log I see the lost device reported all over April, but the driver only aborted once? I believe that means retries succeeded to find (maybe reinitialize) the USB device except once.
It seems so yes: the last few months the connection is repeatedly interrupted, only unfortunately today it did not work to restore the connection.
For context: which OS and version of NUT (packaged or custom built) is that happening with? If you do/can run a custom build, are you in position to check if native libusb-1.0 support from some of the competing branches in issue #300 would behave better?
I just tried a few times to compile the branches "libusb-1.0+0.1", "libusb-1.0" and "libusb-compat-1.0" but with no full success: The branches "libusb-1.0+0.1" and "libusb-1.0" I could compile but were unable to use them. I always get the following error message:
upsdrvctl[44892]: Failed to read pid from /var/state/ups/usbhid-ups1-myups.pid
upsdrvctl[44892]: Network UPS Tools - Generic HID driver 0.42 (2.7.4-420-g2999c95f)
upsdrvctl[44892]: USB communication driver (libusb 1.0) 0.02
upsdrvctl[44892]: No matching HID UPS found
upsdrvctl[44892]: Driver failed to start (exit status=1)
upsmon[18049]: Poll UPS [myups@localhost] failed - Driver not connected
That failure to start probably means that it probed available USB devices and did not find a supported vendor/product ID. Might be that as of those branches the IDs for your UPS were not known yet, which is unlikely at least if you were running last release before that, or that the device remains not seen on the OS side.
Try starting the driver with -DDDDD
argument for debug with higher
verbosity; it would mention IDs it saw in particular.
On Thu, Apr 22, 2021, 00:20 HyP3r @.***> wrote:
For the most part the messages indicate that the OS lost the USB device, not much that NUT can do to fix that part. Might be some power-saving disabling the chip, or something else frozen on the hub and a USB reset restarting every connection (either intentionally, or because some vendors cut corners and implement a port reset same as hub reset). This is a Debian 10 which runs on a HP ProLiant DL360. I mention this because I can't imagine that such a server has energy saving mechanisms for the USB system. How can I find out if something like this happened at this moment?
Guessing by your device name, containing "vm", I might add potentially some delays or flaws in virtualization/passthrough layer as a possible problem source. It is not a virtual machine but the hypervisor. As platform I use Debian 10 with Proxmox VE.
On a busy machine (hypervisor with many guests, or other compute/IO/context-switch heavy systems) or on an under-powered machine (preempted VM just not on CPU) there might also be some exceeded hardcoded timeouts in any layer of the OS, drivers and software down to libusb used by NUT driver, perceiving a loss of device because last data exchange was a bit too long ago. As already described, the system actually runs a hypervisor. And the system is also relatively busy. But somehow I can't imagine that this is enough to hang up so much.
In that log I see the lost device reported all over April, but the driver only aborted once? I believe that means retries succeeded to find (maybe reinitialize) the USB device except once. It seems so yes: the last few months the connection is repeatedly interrupted, only unfortunately today it did not work to restore the connection.
For context: which OS and version of NUT (packaged or custom built) is that happening with? If you do/can run a custom build, are you in position to check if native libusb-1.0 support from some of the competing branches in issue #300 https://github.com/networkupstools/nut/issues/300 would behave better? I just tried a few times to compile the branches "libusb-1.0+0.1", "libusb-1.0" and "libusb-compat-1.0" but with no full success: The branches "libusb-1.0+0.1" and "libusb-1.0" I could compile but were unable to use them. I always get the following error message:
upsdrvctl[44892]: Failed to read pid from /var/state/ups/usbhid-ups1-myups.pid upsdrvctl[44892]: Network UPS Tools - Generic HID driver 0.42 (2.7.4-420-g2999c95f) upsdrvctl[44892]: USB communication driver (libusb 1.0) 0.02 upsdrvctl[44892]: No matching HID UPS found upsdrvctl[44892]: Driver failed to start (exit status=1) upsmon[18049]: Poll UPS @.*** failed - Driver not connected
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/networkupstools/nut/issues/1022#issuecomment-824396176, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMPTFBGJEM4EYPAMYYYDFTTJ5FT3ANCNFSM43LALQWA .
After I called configure
Debian conform the library could be compiled and used. I will now observe the whole thing for a quarter. If it doesn't hang anymore, the ticket can be closed.
I will inform you in this issue.
thanks!
On Fri, Apr 23, 2021, 20:00 HyP3r @.***> wrote:
After I called configure Debian conform the library could be compiled and used. I will now observe the whole thing for a quarter. If it doesn't hang anymore, the ticket can be closed.
I will inform you in this issue.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/networkupstools/nut/issues/1022#issuecomment-825825532, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMPTFHCF5A2V3WEJGDDEHTTKGYTVANCNFSM43LALQWA .
OK, I hit this issue, a reboot "fixed" it but I gathered some info before the reboot , might help somebody, esp if the real fix is to raise a kernel defect. (BTW this happened about 10 days after a real power cut...that would have been annoying if it had failed):
The messages:
Feb 25 14:06:57 ybox usbhid-ups[727]: libusb_get_interrupt: could not claim interface 0: Device or resource busy
Feb 25 14:07:07 ybox usbhid-ups[727]: libusb_get_interrupt: error sending control message: Protocol error
Feb 25 14:07:17 ybox usbhid-ups[727]: libusb_get_interrupt: error sending control message: Protocol error
... 33K of similar messages
Stop and restart nut-server (debian bullseye)
root@ybox:/var/log# systemctl start nut-server
root@ybox:/var/log# systemctl status nut-server
● nut-server.service - Network UPS Tools - power devices information server
Loaded: loaded (/lib/systemd/system/nut-server.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 2024-02-29 10:50:45 GMT; 38s ago
Process: 1066411 ExecStart=/sbin/upsd (code=exited, status=0/SUCCESS)
Main PID: 1066412 (upsd)
Tasks: 1 (limit: 4476)
Memory: 772.0K
CPU: 33ms
CGroup: /system.slice/nut-server.service
└─1066412 /lib/nut/upsd
Feb 29 10:50:45 ybox upsd[1066411]: listening on 127.0.0.1 port 3493
Feb 29 10:50:45 ybox upsd[1066411]: listening on 10.117.0.152 port 3493
Feb 29 10:50:45 ybox upsd[1066411]: listening on 127.0.0.1 port 3493
Feb 29 10:50:45 ybox upsd[1066411]: Can't connect to UPS [qnapups] (usbhid-ups-qnapups): No such file or directory
Feb 29 10:50:45 ybox upsd[1066411]: Can't connect to UPS [qnapups] (usbhid-ups-qnapups): No such file or directory
Feb 29 10:50:45 ybox upsd[1066411]: /etc/nut/upsd.users is world readable
Feb 29 10:50:45 ybox upsd[1066411]: /etc/nut/upsd.users is world readable
Feb 29 10:50:45 ybox upsd[1066412]: Startup successful
Feb 29 10:50:45 ybox systemd[1]: Started Network UPS Tools - power devices information server.
Feb 29 10:50:50 ybox upsd[1066412]: User admin@127.0.0.1 logged into UPS [qnapups]
The driver failed to start:
root@ybox:/var/log# /lib/nut/usbhid-ups -DDD -a qnapups -u nut
....
0.032911 Checking device (051D/0002) (001/002)
0.034470 - VendorID: 051d
0.034502 - ProductID: 0002
0.034516 - Manufacturer: unknown
0.034529 - Product: unknown
0.034542 - Serial Number: unknown
0.034554 - Bus: 001
0.034566 - Device release number: 0106
0.034586 Trying to match device
0.034605 device->Product is NULL so it is not possible to determine whether to activate max_report_size workaround
0.034641 Device matches
0.034703 nut_usb_set_altinterface: skipped usb_set_altinterface(udev, 0)
0.035360 Unable to get HID descriptor (error sending control message: Protocol error)
0.035391 HID descriptor length (method 1) -1
0.035419 HID descriptor, method 2: (9 bytes) => 09 21 10 01 21 01 22 05 04
0.035440 HID descriptor length (method 2) 1029
0.035502 HID descriptor length 1029
0.039396 Unable to get Report descriptor: Protocol error
...
So it thinks Product is NULL ...
root@ybox:/var/log# lsusb -d 051d:
Bus 001 Device 002: ID 051d:0002 American Power Conversion Uninterruptible Power Supply
Bus 001 Device 002: ID 051d:0002 American Power Conversion Uninterruptible Power Supply
Device Descriptor:
bLength 18
bDescriptorType 1
bcdUSB 1.10
bDeviceClass 0
bDeviceSubClass 0
bDeviceProtocol 0
bMaxPacketSize0 8
idVendor 0x051d American Power Conversion
idProduct 0x0002 Uninterruptible Power Supply
bcdDevice 1.06
iManufacturer 3 American Power Conversion
iProduct 1 Back-UPS XS 700U FW:924.Z5 .I USB FW:Z5
iSerial 2 3B2041X34361
bNumConfigurations 1
Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength 0x0022
bNumInterfaces 1
bConfigurationValue 1
iConfiguration 0
bmAttributes 0xe0
Self Powered
Remote Wakeup
MaxPower 24mA
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 0
bNumEndpoints 1
bInterfaceClass 3 Human Interface Device
bInterfaceSubClass 0
bInterfaceProtocol 0
iInterface 0
HID Device Descriptor:
bLength 9
bDescriptorType 33
bcdHID 1.10
bCountryCode 33 US
bNumDescriptors 1
bDescriptorType 34 Report
wDescriptorLength 1029
Report Descriptor: (length is -1)
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x81 EP 1 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0006 1x 6 bytes
bInterval 10
can't get debug descriptor: Resource temporarily unavailable
cannot read device status, Resource temporarily unavailable (11)
(BTW that last line may be key)
Looking at:
root@ybox:/var/log# cat /sys/bus/usb/devices/1-1/product
Back-UPS XS 700U FW:924.Z5 .I USB FW:Z5
It appears to be there.
(also did this as user NUT also OK)
Will reboot ..
Following reboot ...all seemed to come up OK (UPS WAS NOT power cycled)
I noted the following diffs in the output of lsusb:
Before Reboot:
Report Descriptor: (length is -1)
...
can't get debug descriptor: Resource temporarily unavailable
cannot read device status, Resource temporarily unavailable (11)
After Reboot:
Report Descriptors:
UNAVAILABLE
can't get debug descriptor: Resource temporarily unavailable
**Device Status: 0x0000 **
(Bus Powered)
Hello I have a Bluewalker UPS Powerwalker VI 2200 LCD UPS. Unfortunately, always after some time comes the following error messages:
In this case the server has been running for 55 days. So after 55 days of operation now these error messages come. What could be the reason for this?
lsusb
:cat ups.conf
:upsd -v
: