raspberrypi / linux

Kernel source tree for Raspberry Pi-provided kernel builds. Issues unrelated to the linux kernel should be posted on the community forum at https://forums.raspberrypi.com/
Other
11.15k stars 4.99k forks source link

kernel panic - Bug in USB driver? #491

Closed richard-tx closed 10 years ago

richard-tx commented 10 years ago

Here is my setup.

RPI - UK made recently purchased WIFI - realtek usb hub mpd and mpc installed I2C bus enabled Raspian 12-20-2013 release

What happens is that after playing a internet radio station for an extended period of time, eventually the kernel panics. This happens to two different Rpis with two different SD cards so it is fairly easy to reproduce. It just takes about 15-20 minutes of streaming audio.

The message is as follows: dscn1031 dscn1032 dscn1033 dscn1034

richard-tx commented 10 years ago

lsusb

Bus 001 Device 002: ID 0424:9512 Standard Microsystems Corp. Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 001 Device 003: ID 0424:ec00 Standard Microsystems Corp. Bus 001 Device 004: ID 0bda:8174 Realtek Semiconductor Corp. RTL8192SU 802.11n WLAN Adapter Bus 001 Device 005: ID 05e3:0608 Genesys Logic, Inc. USB-2.0 4-Port HUB Bus 001 Device 006: ID 04d9:1203 Holtek Semiconductor, Inc. Keyboard Bus 001 Device 007: ID 045e:0039 Microsoft Corp. IntelliMouse Optical

richard-tx commented 10 years ago

One more item. The Wifi adapter is plugged directly into the Rpi's USB port.

richard-tx commented 10 years ago

After some additional testing, it does not matter if the wifi adapter is plugged into the Rpi or the USB hub. The panic still occurs.

richard-tx commented 10 years ago

~# uname -a Linux lee-pi-wifi 3.10.24+ #614 PREEMPT Thu Dec 19 20:38:42 GMT 2013 armv6l GNU/Linux

P33M commented 10 years ago

If you remove all other devices (keyboard, mouse), does it still crash?

richard-tx commented 10 years ago

I have been running with just the wifi adapter installed for a while and so far it has not crashed. I will let it go for a few more hours.

What are you thinking? The mouse is relatively new and I tried two different keyboards. Do you want me to get out a brand new mouse and try that?

richard-tx commented 10 years ago

There is one thing that I noticed when comparing the 9-25-13 and earlier releases as opposed to the 12-20-2013 release. The wifi activity LED flashes at a near constant rate when playing streaming audio on the earlier releases of wheezy. There are significant periods of time where the activity led does not flash at all with the 12-20-2013 release.

Here are a couple of audio stream URLs that I use. http://wttw.ic.llnwd.net/stream/wttw_wfmt_livebroadcast http://stream.srg-ssr.ch/m/rsc_de/aacp_96

P33M commented 10 years ago

The reason that there is a crash in dwc_otg_hcd_allocate_port is that a split transaction got broken somehow. The subsequent port access found it in an allocated state when it expected it to have been released.

The only devices that could cause this to fail would be the mouse and/or keyboard. It's possible that the higher interrupt loading caused by streaming on wifi/to the BCM2835 audio device triggers the bug.

richard-tx commented 10 years ago

That sounds good, but when it crashes, the mouse and keyboard are quiescent. In other words, it was just sitting there playing music.

P33M commented 10 years ago

Your keyboard/mouse aren't quiescent - they are being polled continuously for input. It will be because of the combination of one of these devices and the odd factor to do with wifi activity that triggers the bug.

You could try running with dwc_otg.fiq_split_enable=0 in /boot/cmdline.txt, but this may cause missed keypresses. It would be an interesting data point to see how many keypresses are lost when music/wifi is active.

richard-tx commented 10 years ago

I am going to take a video showing the relative WIFI traffic on the 9-25 release compared to the 12-20 release. The difference is amazing.

I believe that there is a utility that will measure maximum bandwidth of a wifi connection. That should be a reasonable stress test for the WIFI and USB port. I will let you know what I find.

P33M commented 10 years ago

It may well be that the updated kernel version has a wifi driver version that is newer (and broke something).

You can use modinfo to get the version string from the module in each case.

popcornmix commented 10 years ago

@richard-tx iperf is the usual tool for measuring network bandwidth.

richard-tx commented 10 years ago

adding the statement dwc_otg.fiq_split_enable=0 to cmdline.txt effectively disables the keyboard.

richard-tx commented 10 years ago

iperf did not cause the kernel to panic.

richard-tx commented 10 years ago

I just upgraded my wifi router to a 802.11n. Since then the panics are more frequent. I am going back to the 9-20-13 image of Wheezy.

jt-fuw commented 10 years ago

Possibly I have problems due to the same bug, although with other device. richard-tx, have you tried older kernel, e.q. 3.2.0-4-rpi ?

The problem I have occurs only when an USB device sends to the host large amount of data (I tested receiving 3000 bytes and I never noticed any problem), forming many blocks, in relatively short time, without any data sent from the host to the device during the time. Your case can be caused by the same bug if, and only if, the router can send to the Pi host many packet via USB without waiting for data (e.g. confirming correct receipt of the packet) from the host. The driver can store received data OUTSIDE BUFFER in such a case.

------- More detailed information about my problem follows below ------- Kernels tried: 3.2.0-4-rpi 3.6.11+ 3.10.27+ 3.10.30+.

The problem is with receiving data via USB from Hantek DSO-1202BV digital scope; the data is sent in blocks of 10214 bytes (seems a block consists of 160 microframes, tha last one is shorter); total data length is 307386 bytes, the USB speed is 12Mb/s, the scope is connected directly to Raspberry Pi (model B) USB port (I tried also vie an USB hub, results weren't better); the scope has its own power supply, it doesn't need power from Pi.

On other machine, running Ubuntu 10.04, each block is stored into separate buffer; on Pi, running Raspbian, sometimes next block (or part of it) is appended to its precedent in its buffer; in some cases even 3 consecutive blocks were stored in one buffer. With 3.2.0-4-rpi usually the data received is correct, except that block boundaries are lost; with other kernels, the amount of data put in a buffer happens to exceed the buffer size - this sometimes crashes the system unless the buffer is very large; even when a larger buffer is used and the system does not crash, the data received ss usually incomplete, receiving all data without an error is a rare case.

This was tested using libusb-1.0 and USBDEVFS_* kernel ioctls: USBDEVFS_BULK or USBDEVFS_SUBMITURB+USBDEVFS_REAPURBNDELAY. With libusb-1.0 results were the worst: system crash. With ioctls crashes were very rare, but the data was wrong, except when kernel 3.2.0-4-rpi was used, with which the data was usually correct, although frequently stored in buffers not in one-block-one-buffer manner - instead, a buffer was filled completely and next buffer got shorted data, like: buffers 1-16 got 10214 bytes each, buffer 17 got 16384 bytes (full size, max for kernel 3.2.0-4-rpi), buffer 18 got 4044 bytes (note 16384+4044=2*10214 - 2 blocks were in 2 buffers).

Perhaps some problems are result of Pi speed - it is much slower than the machine with Ubuntu 10.04 - or other hardware problems; but surely it is wrong that received data is stored outside assigned buffer and sometimes destroys the OS.

richard-tx commented 10 years ago

I don't know what versions you are referring to. I don't have any problems with the 9-25-13 version and the previous version. It is only when I run the 12-20-13 and the 01-09-2014 is when I get a kernel panic.

cat /boot/issue.txt

Raspberry Pi reference 2013-09-25 (armhf) Generated using spindle, http://asbradbury.org/projects/spindle/, be1a0b3, stage 4-lxde-edu.qed

jt-fuw commented 10 years ago

...$ cat /media/boot/issue.txt Raspberry Pi reference 2013-06-19 (armhf) Generated using spindle, http://asbradbury.org/projects/spindle/, 8cb754e, stage4-lxde-edu.qed

This command shows originally installed version. But I upgraded kernel twice. Current kernel version is shown by "uname -r" (or more detaily by "uname -rv"). Unfortunately, I cannot try it right now, because I just dismounted my Pi.

The installation I use is older than your and in my case the originally installed kernel had a bug, as well as two newer ones. Seems either there were some good versions and later the same bug returned, or the bug I see is a different one.

How can I install these kernels which run without problem in your case?

jt-fuw commented 10 years ago

Seems '#' as first char in line is interpreted as title mark - put a space before it.

richard-tx commented 10 years ago

http://downloads.raspberrypi.org/raspbian/images/raspbian-2013-09-27/2013-09-25-wheezy-raspbian.zip is the one that works for me.

P33M commented 10 years ago

Please test using BRANCH=next rpi-update firmware. This should fix the crash, but use of USB audio may still produce occasionally garbled packets.

richard-tx commented 10 years ago

Test is currently running. So far so good.

rich

On Wed, Mar 19, 2014 at 12:42 PM, P33M notifications@github.com wrote:

Please test using BRANCH=next rpi-update firmware. This should fix the issue.

Reply to this email directly or view it on GitHubhttps://github.com/raspberrypi/linux/issues/491#issuecomment-38083260 .

Richard Andrews A&E Tool, LLC http://ae-tool.com http://artisans.homeunix.com:443 http://MyMachineryForum.com

richard-tx commented 10 years ago

I decided to enable the I2c buses.

Something strange is going on with I2C bus 1. I do have a device at 0x48 but there is no device at 0x3b nor was that address ever reserved in previous releases.

i2cdetect -y 1

     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
00:          -- -- -- -- -- -- -- -- -- -- -- -- --
10: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
20: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
30: -- -- -- -- -- -- -- -- -- -- -- UU -- -- -- --
40: -- -- -- -- -- -- -- -- 48 -- -- -- -- -- -- --
50: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
60: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
70: -- -- -- -- -- -- -- --
P33M commented 10 years ago

@richard-tx - Has the driver crashed since updating?

richard-tx commented 10 years ago

nope

P33M commented 10 years ago

Good to hear.