raspberrypi / linux

Kernel source tree for Raspberry Pi-provided kernel builds. Issues unrelated to the linux kernel should be posted on the community forum at https://forums.raspberrypi.com/
Other
11.22k stars 5.03k forks source link

Raspberry PI 3B+ and NUT (Network UPS Tools) #2977

Open Zaitchikk opened 5 years ago

Zaitchikk commented 5 years ago

Describe the bug I have a Powercom SPD-850U UPS. I try to connect this UPS with my Raspberry PI 3B + via NUT, I came to the conclusion - all works nice with Raspbian (2018-03-13), the first one that supports 3B +, but does not work with the latest version of Raspbian (2019-04-08). I think, the algorithm for working with the USB bus has changed in the latest versions of the kernel. And this is bad, because it breaks down the work of previously debugged programs.

To reproduce I use the following NUT driver settings. This works on both the Debian desktop and Raspbian(2018-03-13) with this UPS.But does not work on Raspbian (2019-04-08)

sudo nano /etc/nut/ups.conf [spider] driver = usbhid-ups port = auto desc = "Poaercom SPD-850U"

Expected behaviour The NUT driver start command must start the NUT driver

$ sudo upsdrvctl start Network UPS Tools - UPS driver controller 2.7.4 Network UPS Tools - Generic HID driver 0.41 (2.7.4) USB communication driver 0.33 Using subdriver: PowerCOM HID 0.5

Actual behaviour Attempt to launch NUT driver

sudo upsdrvctl start Network UPS Tools - UPS driver controller 2.7.4 Network UPS Tools - Generic HID driver 0.41 (2.7.4) USB communication driver 0.33 No matching HID UPS found Driver failed to start (exit status=1)

System

Raspberry PI3B+

$ cat /etc/rpi-issue Raspberry Pi reference 2019-04-08 Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, e910bf55ac92e27c6e77043e8995bcb5e0a2d742, stage5

$ vcgencmd version Mar 27 2019 15:45:53 Copyright (c) 2012 Broadcom version 2e98b31d18547962e564bdf88e57b3df7085c29b (clean) (release) (start)

$ uname -a Linux raspberrypi 4.14.98-v7+ #1200 SMP Tue Feb 12 20:27:48 GMT 2019 armv7l GNU/Linux

Logs

$ dmesg [ 937.439522] usb 1-1.1.2: new low-speed USB device number 6 using dwc_otg [ 937.600259] usb 1-1.1.2: New USB device found, idVendor=0d9f, idProduct=0004 [ 937.600275] usb 1-1.1.2: New USB device strings: Mfr=3, Product=1, SerialNumber=2 [ 937.600284] usb 1-1.1.2: Product: HID UPS Battery [ 937.600292] usb 1-1.1.2: Manufacturer: POWERCOM Co.,LTD [ 937.600301] usb 1-1.1.2: SerialNumber: 004-0D9F-000 [ 937.808194] hid-generic 0003:0D9F:0004.0002: hiddev96,hidraw0: USB HID v1.00 Device [POWERCOM Co.,LTD HID UPS Battery] on usb-3f980000.usb-1.1.2/input0

$ sudo /lib/nut/usbhid-ups -DD -a spider Network UPS Tools - Generic HID driver 0.41 (2.7.4) USB communication driver 0.33 0.000000 debug level is '2' 0.002877 upsdrv_initups... 0.004120 Checking device (0D9F/0004) (001/006) 0.006031 - VendorID: 0d9f 0.006103 - ProductID: 0004 0.006176 - Manufacturer: unknown 0.006273 - Product: unknown 0.006366 - Serial Number: unknown 0.006461 - Bus: 001 0.006557 - Device release number: 0002 0.006650 Trying to match device 0.006765 Device matches 0.007697 Unable to get HID descriptor (error sending control message: Broken pipe) 0.007879 HID descriptor length 996 0.008555 Unable to get Report descriptor: Broken pipe

Zaitchikk commented 5 years ago

Hey anyone! Can I hope that the problem will ever be fixed?

ghost commented 5 years ago

Please contact whoever maintains NUT. They are responsible for this working or not in the first instance. If they come back and say it is a kernel bug then feel free to file a bug here, but as things stand this is not a kernel bug so this is not the correct place to report it.

Zaitchikk commented 5 years ago

but as things stand this is not a kernel

If in the earlier version of Linux for Raspberry, the program worked normally with USB ports, and after updating this operating system, the program stops working, most likely, there have been some changes in the kernel of the operating system. And if we also take into account that the program works fine on Linux for X86 / 64 platforms, we can conclude that the Linux kernel for Raspberry does not work with USB ports in a standard way.

6by9 commented 5 years ago

It's difficult to debug without the hardware.

The suspicion would be that it is something that has changed between the 4.9 and 4.14 kernels. It'd be interesting to know if it worked against something like Ubuntu 18.10 or 19.04 on an x86 PC which will be using a similar kernel (actually 4.18 and 5.0 respectively IIRC)

You can use rpi-update to download particular releases using hashes from https://github.com/Hexxeh/rpi-firmware/commits/master sudo rpi-update 5c80565 should get the last 4.9 release. PLEASE DO NOT DO THIS ON A CRITICAL SYSTEM. sudo apt install --reinstall raspberrypi-kernel raspberrypi-bootloader should get you back to apt's version of the kernel and firmware.

Zaitchikk commented 5 years ago

It's difficult to debug without the hardware.

You can check on any other UPS that is supported by the driver "usbhid-ups".

It'd be interesting to know if it worked against something like Ubuntu 18.10 or 19.04 on an x86 PC

I tested this UPS on the "Raspberry Pi Desktop". The system worked under "VMware® Workstation 15 Player 15.1.0 build-13591040" Everything works. Here is the version of that system.

$ uname -a
Linux raspberry 4.9.0-9-amd64 #1 SMP Debian 4.9.168-1+deb9u2 (2019-05-13) x86_64 GNU/Linux
Zaitchikk commented 5 years ago

It'd be interesting to know if it worked against something like Ubuntu 18.10 or 19.04 on an x86 PC

Now I tested on Debian 10 under the same Workstation 15 Player. Everything works great.

$ uname -a
Linux v 4.19.0-5-amd64 #1 SMP Debian 4.19.37-5 (2019-06-19) x86_64 GNU/Linux
clepple commented 5 years ago

Hi, one of the NUT developers here. @Zaitchikk can confirm, but the base version of NUT in the two versions of Raspbian and Debian 10 is the same (2.7.4). As I understand it, the other differences between the working version and the non-working version of Raspbian are the kernel version, and potentially libusb-0.1, which NUT 2.7.4 uses to talk to this particular UPS. What about the versions of libusb? (dpkg -l libusb-0.1\*) If not libusb, it is likely a kernel regression.

pelwell commented 5 years ago

Hi, @clepple - good to have you here. From the usbhid-ups debug output above it seems like the tool is failing quite early. Could you (or anybody) run it under strace on working and non-working systems so we can see the system calls that lead to the error?

# On the old, working system
$ sudo strace -o strace_good.txt /lib/nut/usbhid-ups -DD -a spider
# And on the new, broken system
$ sudo strace -o strace_bad.txt /lib/nut/usbhid-ups -DD -a spider

Feel free to change any of the usbhid-ups parameters as you see fit, as long as they are the same in both cases.

Zaitchikk commented 5 years ago

What about the versions of libusb?

libusb-0.1-4:a 2:0.1.12-30 armhf userspace USB programming library

Zaitchikk commented 5 years ago

And on the new, broken system $ sudo strace -o strace_bad.txt /lib/nut/usbhid-ups -DD -a spider

strace_bad.txt

I cannot quickly send debug output for the working system (Raspbian (2018-03-13)), because I will have to install it. But I will do it now, and will send a little later.

pelwell commented 5 years ago

Thanks. There are a lot of EACCESS and EPERM errors in the trace - can you just confirm that you ran it with sudo?

Zaitchikk commented 5 years ago

can you just confirm that you ran it with sudo?

Yes

By the way, I can quickly send debug output to the working desktop Debian 10 (under VMware® Workstation 15 Player). I do not know if this is necessary.

pelwell commented 5 years ago

Yes please - the two systems ought to look about the same at this level.

Zaitchikk commented 5 years ago

Here for the desktop Debian10 strace_good.txt

pelwell commented 5 years ago

Thanks again - the output is very different, but I can see that the syscalls that return EPIPE on the broken system:

write(2, "Checking device (0D9F/0004) (001"..., 38) = 38
open("/dev/bus/usb/001/004", O_RDWR)    = 4
ioctl(4, USBDEVFS_CONTROL, 0x7e9807c4)  = -1 EPIPE (Broken pipe)
ioctl(4, USBDEVFS_CONTROL, 0x7e9807c4)  = -1 EPIPE (Broken pipe)
ioctl(4, USBDEVFS_CONTROL, 0x7e9807c4)  = -1 EPIPE (Broken pipe)

are succeeding on the Debian 10 system:

write(2, "Checking device (0D9F/0004) (002"..., 38) = 38
openat(AT_FDCWD, "/dev/bus/usb/002/004", O_RDWR) = 4
ioctl(4, USBDEVFS_CONTROL, 0x7ffc26aaaa50) = 6
ioctl(4, USBDEVFS_CONTROL, 0x7ffc26aaaa50) = 34
ioctl(4, USBDEVFS_CONTROL, 0x7ffc26aaaa50) = 6

Did you try downgrading to a 4.9 kernel as @6by9 suggested:

You can use rpi-update to download particular releases using hashes from https://github.com/Hexxeh/rpi-firmware/commits/master sudo rpi-update 5c80565 should get the last 4.9 release. PLEASE DO NOT DO THIS ON A CRITICAL SYSTEM. sudo apt install --reinstall raspberrypi-kernel raspberrypi-bootloader should get you back to apt's version of the kernel and firmware.

pelwell commented 5 years ago

A safer thing to try would be to use the upstream dwc2 driver instead of the heavily optimised dwc_otg driver that is the default on Raspberry Pi kernels. Simply add dtoverlay=dwc2 to config.txt and reboot. USB performance will be reduced, so this isn't a long-term fix, but it might help to pinpoint the cause.

Zaitchikk commented 5 years ago

Simply add dtoverlay=dwc2 to config.txt and reboot.

Yes it works !! I suspected there was a way to get 3B + to work as before. I think it should somehow better describe this method in the documentation. The problem could be solved a couple of months ago.

I bought Raspberry in order to get some SNMP card functionality for the UPS. USB speed is not important to me at all.

And by the way ... if it's still interesting. I promised. Here is the debug output for a working system (Raspbian (2018-03-13)). strace_good.txt

pelwell commented 5 years ago

That's an interesting result, as it narrows it down to a driver regression.

If I could ask for one more thing, could you do the same strace capture using the dwc2 driver? strace_dwc2.txt would be a good name for the output file.

Zaitchikk commented 5 years ago

OK! Here it is strace_dwc2.txt

pelwell commented 5 years ago

Thank you very much - that's a big help. It shows that dwc2 today behaves the same as the dwc_otg used to, which should guide our USB expert when he looks into the problem.

P33M commented 5 years ago

Can you test rpi-update firmwares in sequence to find the first failing one?

Use hashes from this repository: https://github.com/Hexxeh/rpi-firmware/commits/master

And do sudo rpi-update <hash> to install a specific version.

Zaitchikk commented 5 years ago

Can you test rpi-update firmwares in sequence to find the first failing one?

I checked only official releases downloaded from http://downloads.raspberrypi.org/raspbian/images/ Raspbian (2018-03-13) worked. uname -a Linux raspberrypi 4.9.80-v7+ #1098 SMP Fri Mar 9 19:11:42 GMT 2018 armv7l GNU/Linux The next version - Raspbian (2018-04-18) no longer worked.

I tried your way. And this version is no longer working uname -a Linux raspberrypi 4.14.27-v7+ #1100 SMP Fri Mar 16 13:51:48 GMT 2018 armv7l GNU/Linux

It occurred to me that I jumped over the version. I set the previous one. And it doesn’t work either. uname -a Linux raspberrypi 4.14.26-v7+ #1099 SMP Wed Mar 14 14:59:28 GMT 2018 armv7l GNU/Linux

Zaitchikk commented 5 years ago

Simply add dtoverlay=dwc2 to config.txt and reboot.

Not for long I was happy. It turned out that with this setting, the NUT driver, although it starts up, falls off periodically. After that, the debug output is as follows: strace_bad_dwc2.txt It is treated only by rebooting Raspberry.

P33M commented 5 years ago

That's irritating. The breakage corresponds to a kernel major version update and not anything to do with a dwc_otg driver change.

Using the dwc_otg driver, can you try capturing the USB "traffic" in both cases using with usbmon/wireshark, then upload the traces somewhere? https://discussions.flightaware.com/t/monitoring-usb-utilization-on-raspberry-pi/16421

Instead of filtering on address/endpoint, just dump the whole output from usbmon1 until after the transmission breaks (or starts normally on 4.9). Note: don't enter any passwords via keyboard when wireshark is capturing as these will appear in the packets. You may also want to disconnect the network cable while capture is ongoing to reduce the amount of spam created.

Zaitchikk commented 5 years ago

https://discussions.flightaware.com/t/monitoring-usb-utilization-on-raspberry-pi/16421

I cannot repeat this instruction. It seems to me this line of script is not working.

$ sudo mount -t debugfs none_debugs /sys/kernel/debug
mount: none_debugs is already mounted or /sys/kernel/debug busy

At least I don't see any of the instructions in the wireshark window

P33M commented 5 years ago

It appears that debugfs is already mounted so you can skip that step.

You may need to run wireshark as root (from a terminal on the desktop) in order to see the usbmon interfaces.

Zaitchikk commented 5 years ago

to run wireshark as root

Yes, it did.

Now explain in more detail what conditions should be. You want me to start debugging in the current configuration (Raspbian 2019-07-10 with the native dwc_otg driver), at the time the NUT driver starts. Did I understand correctly?

P33M commented 5 years ago

Yes, capture with the default driver configuration. Ideally, start the capture with the device unplugged, then plug the device in, then start NUT. You can stop the capture after the communication breaks in the newer version of kernel (4.14.26) and after communication starts successfully in the older version of kernel (4.9.80).

Zaitchikk commented 5 years ago

Here is the first attempt. I rebooted Raspberry with the UPS turned off, then did as you wrote. The file contains the result. If you got what you need, let me know. I will repeat the process on 4.14.26 4.9.80_successful_start.pcapng.zip

Zaitchikk commented 5 years ago

I tried to install an older system, designated as "firmware: Rebuild from correct tree" dated March 10, 2018. Its version:

$ uname -a
Linux raspberrypi 4.14.24-v7 + # 1097 SMP Mon Mar 5 16:42:05 GMT 2018 armv7l GNU / Linux

And the NUT driver did not start either. Could there be a problem in "rebuild ... ing"? Here is the wireshark debug output: 4.14.24_Failed_start.pcapng.zip

Zaitchikk commented 5 years ago

As I said, NUT start with the dwc2 driver does not work stably.

I was have a little dance with a tambourine around the fire, and I found one new and very strange circumstance. Perhaps it will prompt developers to solve this annoying problem.

I was able to run NUT with the default driver (dwc_otg) with my Powercom SPD-850U UPS. The sequence is as follows:

  1. Run Raspberry with a UPS connected via USB. It is necessary to start exactly “with a UPS connected via USB”, because if we connect the UPS after starting Raspberry, the magic will not work. After this procedure, all checks show that the NUT driver did not start, and cannot be started manually.
  2. Take out the usb connector of the UPS and connect it to another Raspberry USB port. It is necessary to connect it to another USB port, otherwise the magic will not work. If you want to connect it to the same USB port, you must wait more than 10 seconds, (But this does not always work.)
  3. Run the NUT driver start command.     sudo upsdrvctl start After that NUT works, but I'm not sure which is stable.

The following sequence is also possible:

  1. Start Raspberry with the UPS disconnected on USB.
  2. Connect the UPS via USB. We can see that the NUT driver did not start and does not start manually.
  3. Reconnect the UPS to the OTHER USB port. The NUT driver still cannot be started manually.
  4. Execute commands
          sudo / home / pi / hub-ctrl -h 1 -P 2 -p 0
          sudo / home / pi / hub-ctrl -h 1 -P 2 -p 1

    Source here https://github.com/codazoda/hub-ctrl.c https://github.com/codazoda/hub-ctrl.c/wiki/Raspberry-Pi

  5. After that, the NUT driver is started manually by the command     sudo upsdrvctl start After that NUT works, but I'm not sure which is stable.

After these procedures, the NUT can be reconnected to any port. The driver picks it up automatically ... BUT only until the next restart of Ruspberry.

Kernel version

$ uname -a
Linux raspberrypi 4.19.66-v7+ #1253 SMP Thu Aug 15 11:49:46 BST 2019 armv7l GNU/Linux
P33M commented 4 years ago

Whoops, forgot about this one - what with Pi4 release and all.

The traces you've provided are mostly identical up until the point the host requests a string descriptor for index 0x0. In the working case, a 6-byte string descriptor is returned (two LANGIDs), in the broken case you get -EPIPE (protocol error). All subsequent transfers are broken in the same manner.

Can you try with latest rpi-update firmware? It bumps kernel to 4.19 and also has a few dwc_otg fixes for small transfers such as these.

JamesH65 commented 4 years ago

@Zaitchikk Did you have chance to try with the latest firmware? I suspect that apt update apt full-upgrade would be sufficient, rather than using rpi-update.

Zaitchikk commented 4 years ago

I tried "apt full-upgrade" now. No effect. Only those magic dances with a tambourine, about which I wrote above, allow me to launch NUT.

hugazo commented 3 years ago

Sorry the late bump, but i'm having exactly this issue and since i have a raspberry pi 4, i don't know if the kernel reversal solution could work for me.