raspberrypi / firmware

This repository contains pre-compiled binaries of the current Raspberry Pi kernel and modules, userspace libraries, and bootloader/GPU firmware.
5.17k stars 1.68k forks source link

Interrupt collision between smsc95xx and USB storage drivers under heavy load #9

Closed benosteen closed 11 years ago

benosteen commented 12 years ago

Steps to reproduce:

1) Lots of files on a USB drive, plugged in and mounted. 2) Begin a download of a large file (100Mb+ is suggested) to that USB drive. 3) During download, try to access large numbers of files (suggestions to follow)

This will at some indeterminate point freeze the system with kernel panics from the USB storage driver - "... not syncing: Fatal exception in interrupt" and kernel errors from the ethernet driver : "kevent may have dropped the interrupt."

Suggested means to replicate step 3)

If rootfs is on USB, apt-get install'ing a group of packages, apt-cache search and so on are good ways to uncover this collision. Otherwise, searching or grepping through a reasonable number of files on the USB is enough (find . | xargs grep -i "foo") for example.

It is hard to capture this error, as the kern.log doesn't sync the errors to disc, and the errors flash by too fast on tty to see them with any clarity.

Recreated with latest kernel + UAS built in and new modules and with kernel modules from 13/04 - with rootfs on USB and with the stock rootfs on SD. Having the rootfs on SD makes it more difficult to simulate the type of storage demand required to replicate the bug however.

leewillis77 commented 12 years ago

@mozzwald @popcornmix Can you post the patch you're using?

mozzwald commented 12 years ago

@leewillis77 The patch is at https://github.com/raspberrypi/linux/commit/30b708f9c69a161051e9ddcab7772a35ca8b7b23

abishur commented 12 years ago

I've also been using the patch and I haven't had any more panics. I did have a read/write error when using LAN Speed test but it didn't break anything. I haven't experienced this problem while doing a normal copy/paste or when watching video off the hard drive over the network

iciclethief commented 12 years ago

Using the patched kernel from arch linux I've been running heavy concurrent network and USB access for over 24 hours with no workarounds in place, and I'm pleased to say I've not experienced any further kernel panics.

However during this load test I noticed the following appearing regularly in the system logs:

DEBUG:handle_hc_chhltd_intr_dma:: XactErr with NYET/NAK/ACK

Like @asb stated earlier, there may be multiple issues at play.

mozzwald commented 12 years ago

Still having USB issues. Current setup is:

When trying to clone raspi kernel from github to SD card and writing zeros to file on the hard drive I get "kswapd0: page allocation failure" errors repeatedly and system is unusable (http://pastebin.com/EZkRz27v). Setting vm.min_free_kbytes = 12288 in /etc/sysctl.conf prevents total crash of the system but oom-killer kicks in and kills git (http://pastebin.com/n5RssAtV).

narensankar commented 12 years ago

This is not a USB issue but an Ethernet issue. The network stack is the one that is unable to keep up with the memory allocations. what does "free" report before and when you are cloning? What firmware split are you using - i.e. how much memory does Linux have?

mozzwald commented 12 years ago

I'm using the 224 split. Before and during results of 'free' http://pastebin.com/3jy8MTNt

larsth commented 12 years ago

@mozzwald playing with ethernet settings could provide a temporary solution to the problem. Try fx to use 10 Mbit/s only, and/or half-duplex.

FYI, i'm thinking about using the tools mentioned here http://www.cyberciti.biz/faq/linux-change-the-speed-and-duplex-settings-of-an-ethernet-card/

(I cannot test this, i am in the Pi queue @ RS Components)

asb commented 12 years ago

@mozzwald it looked to me like vm.min_free_kbytes worked around the Ethernet issue (see the launchpad discussion I linked to earlier). Git can have quite high memory requirements and your second paste at first glance looks like a standard OOM. Have you tried enabling swap?

mozzwald commented 12 years ago

None of the temporary fixes seem to completely fix all the problems I have. I'm no expert, but I think there might be an issue with the USB host drivers. I compiled the kernel without the smsc ethernet driver to remove it from the equation. I connected a powered USB docking station with ethernet, serial, lpt, and USB hub. Also connected is an external self-powered hard drive. Connecting via ssh I try to wget a file from my local desktop and it crashes. Serial output is here: http://pastebin.com/L6VRRxM9

leewillis77 commented 12 years ago

I'm now running a patched kernel, and have now got uptime of 96 days - compared to a previous best of about 23 ;)

So - it definitely makes things more stable for me - thanks guys!

Note: No other changes other than running the patched kernel, ie no tweaks to vm.min_free_kbytes. I do have about 600M of swap space, although only 412k has been used.

pepedog commented 12 years ago

I think you must mean 96 hours Everyone is doing the vim min thing, even openelec https://github.com/OpenELEC/OpenELEC.tv/commit/bd4c37762a9a382d800396a25ac9b152f5a443fa The setting has improved things since October for me I agree with you that it's much more stable

leewillis77 commented 12 years ago

Erm yes - 96 hours ;)

It seems my system does have vm.min_free_kbytes = 8192 after all (Set in /etc/sysctl.conf) - I guess that must have been in the debian image itself? - pretty sure I didn't add it :)

pepedog commented 12 years ago

Yes, default in Debian, arch, openelec, and probably fedora too. It's probably my fault it's that way and not sure if it's still needed, but it did its job at the time

guisacouto commented 12 years ago

Did you manage to solve this?

I have "vm.min_free_kbytes = 8192" in my sysctl.conf, and I still get a kernel panic. Just have to start a torrent that gets downloaded into my external hd and after some minutes.. BANG!

screenshot: http://img827.imageshack.us/img827/3489/20120521231038.jpg

I'm running arch linux, so everything should be up to date

asb commented 12 years ago

@guisacouto did you try setting a larger value for min_free_kbytes? (e.g. 16384). When I was able to provoke this issue, I found 8192 insufficient.

guisacouto commented 12 years ago

No i didn't! Will try it now and let you know in a few minutes! Thank you so much for your quick response :)

guisacouto commented 12 years ago

@asb

Tried with that value and didn't work:\ kernel panic again..

edit:

I'm going to try to increase it a bit more to something like 25MB, and see how it goes.

pepedog commented 12 years ago

I will ask for the two arch packages to be updated, 8 days old makes them really ancient.

guisacouto commented 12 years ago

What are the packages that should be updated? The current raspberrypi-firmware is a build from 19/05/2012.

(i've tried with 21MB and no success.. testing now with 25)

guisacouto commented 12 years ago

Ok this is a bit annoying.. not even with 25MB I get rid of the kernel panic..

@pepedog Are there significantly outdated packages in ArchARM (with bugfixes for this issue) ?

asb commented 12 years ago

The other potential fix is smsc95xx.turbo_mode=N in /boot/cmdline.txt. If that doesn't work, then try latest firmware. If that doesn't work, then wait for a fix.

guisacouto commented 12 years ago

will check on that;) tks!

edit:

Seems like I'll have to wait for a fix. Neither of the two solutions worked. Tks for your patience :)

leewillis77 commented 12 years ago

Guisacouto - the only thing I needed to do to fix this was to install the patched kernel. As far as I know this isn't a "package update", but a manual re-compile at this stage. I found the guide here quite helpful:

http://elinux.org/RPi_Kernel_Compilation

As soon as I was running the new kernel, no more faults, and my pi has been up for 11 days and counting, including taking a number of large backups, and duplicating them over the network, and streaming audio flawlessly to my squeezebox. I can't tell from your earlier comments if you've actually updated your kernel?

pepedog commented 12 years ago

Arch has latest kernel and firmware as updates now. pacman -Syu to update

guisacouto commented 12 years ago

I'm not at home right now, and my external hd is not connected so even if I update now through ssh I'll not be able to test this.

Will report later !

edit:

It's working no. No more kernel panics :)

guisacouto commented 12 years ago

Torrents went just fine. Tried to use sshfs and the kernel panic just arrived.. again.

I guess the bug is still there

pepedog commented 12 years ago

I think I will supply you with a regular rootfs tar, as opposed to card image. You are comfortable with partitioning/formatting? I don't see errors with rootfs on card or hard drive.

guisacouto commented 12 years ago

Comfortable enough (what I don't know by heart I can google). But I don't see how could this be any different if my packages are updated

guisacouto commented 12 years ago

I got a new power supply from ebay with 2A, and tested mounting the RPi with sshfs in my desktop and unrar a file. This used to cause the kernel panic very fast, but now it didn't. However when I uploaded a torrent through transmission web interface I had another kernel panic with the "not syncing" message again. Not sure why, since uploading a torrent should be very fast and not a heavy task.

Reeboted again and it uploaded fine, and is now downloading fine. My min_free_kbytes is now at 30720. edit: just happened again while unraring and downloading at the same time

popcornmix commented 12 years ago

@guisacouto Can you confirm you are running with latest kernel? "uname -a" should show when it was built. Use hexxeh's firmware updater tool. (I'm not sure the Arch update always produces a recent kernel). If still a problem, can you run with kernel_debug.img and post the backtrace of the panic.

guisacouto commented 12 years ago

Currently I'm running 3.1.9-15 (I noticed know that today there is a new one, -16). Linux berry 3.1.9-15-ARCH+ #8 PREEMPT Tue May 22 01:15:53 UTC 2012 armv6l GNU/Linux

Should I need to use hexxeh's firmware update tool while using arch? I thought that that tool would be usefull for debian since there are less updates. Is is needed in arch?

Another question: for every raspberrypi-firmware update, does the kernel need to be rebuilt?

I will update now to the 3.1.9-16 that came today, and test if the problem still exists. I think it does since I didn't see any updates on solving this issue.

best regards

popcornmix commented 12 years ago

Interesting. That is not a prebuilt kernel of ours from github (we only enabled PREEMPT a couple of days ago). So we know when it was built, but not from what source, of with what .config options.

So yes, running hexxeh's updater tool will replace it with a known up to date kernel would be worthwhile.

guisacouto commented 12 years ago

Ok tks! Will test this one with the update and if it panics (I'm pretty sure it will), I'll run hexxeh's updater tool.

Will report in any situation; it might help someone else.

edit: already got the kernel panic. Will try hexxeh's update tool now

pepedog commented 12 years ago

First, the updater tool should be ok on arch. Arch is built with this config https://github.com/archlinuxarm/PKGBUILDs/blob/master/core/linux-raspberrypi/config I asked for kernel and firmware pkgs to be rebuilt, it hasn't happened with firmware, it's 9 days old. Think he is busy, whisper is he's setting up a build farm Lastly, why is this issue in firmware, surely it belongs in linux?

guisacouto commented 12 years ago

I think it's working now!:D

the firmware+kernel update from the git repository (with rpi-update) worked!

tks!

pepedog commented 12 years ago

I should also point out with arch, kernel and modules are not bundled in with firmware. The kernel is a separate package, this site is the source but we do our own config based on the default config, trips up occasionally. Preempt was on in our package because it was recently built. The firmware pkg is going to be rebuilt tonight. Mostly just advancing release version will rebuild the packages, if a config change is needed I just hope Dom lets me know.

popcornmix commented 12 years ago

@pepdog erm, the config has changed...

guisacouto commented 12 years ago

I'm starting to think I'm doomed. I'm getting kernel panics again while downloading torrents. I was using ntfs in the external hard drive and change it to ext4 to see if it helped. It didn't. The cpu gets ~100% while downloading since it drains all the download bandwidth it can handle (my internet connection can download ~6MB/s; the Pi goes near 2MB/s).

If it helps, the setup is: wifi rt5370 external hd (wd 320GB) arch linux arm with kernel and firmware updated with the hexxeh's rpi-update tool

I'm going to test now with the debug kernel img to see the stacktrace.

best regards

rewolff commented 12 years ago

I have a GPS reciever on an PL2303 USB serial converter. This crashes more than once a day (I leave in the morning, when I come back it's crashed). So about 300 bytes of USB traffic per second manages to crash things. I don't know if I get a kernel oops. I don't have a screen there. I tried running "netconsole" but that didn't work out: the eth driver doesn't support polling. Next option is the uart for the kernel oops output... :-)

rewolff commented 12 years ago

P.S. Not sure if it's the same issue, of course. But I thought I'd mention it because it might provide a hint as to what's wrong.

guisacouto commented 12 years ago

I don't know why but I can't run kernel_debug.img... A square with "rainbow" colors apear. I guess is the gpu not being able to boot the kernel..

popcornmix commented 12 years ago

A bad kernel_debug.img was checked in a couple of days ago, and fixed today. Can you update?

pepedog commented 12 years ago

I just reviewed the config here, arch will have problems with compiled kernel, devtmpfs is missing. Deb will be hit too with latest udev

guisacouto commented 12 years ago

I did update a 2 hours ago. Does the hexxeh's tool use raspberrypi@github as source? If it does, mine should be updated. Will do it again anyway

asb commented 12 years ago

No, unfortunately it updates from his own repository so it can lag behind.

On 31 May 2012 17:24, guisacouto reply@reply.github.com wrote:

I did update a 2 hours ago. Does the hexxeh's tool use raspberrypi@github as source? If it does, mine should be updated. Will do it again anyway


Reply to this email directly or view it on GitHub: https://github.com/raspberrypi/firmware/issues/9#issuecomment-6040524

popcornmix commented 12 years ago

Hexxeh's repo is up to date now.

guisacouto commented 12 years ago

I already updated, however it doesn't boot properly i think.

Here is an image of where it stops: http://desmond.imageshack.us/Himg848/scaled.php?server=848&filename=20120531180440.jpg&res=landing

This is all really odd.. I guess a kernel panic could be ok if I were out of memory since there is no swap, but in this case while I'm downloading before it crashes I'm only using ~30MB or something, only the cpu gets crazy working at ~100% trying to use as much network bandwidth as possible... this should only make things slower, but without crashing

edit: I'm not connected with ethernet, only wireless, but I think that in kernel_debug it doesn't load the driver module

pepedog commented 12 years ago

guisacouto Can you see the line "cannot stat", that is symptom of no devtmpfs New kernel and firmware for arch tommorow, there is a way to make pacman install pkg from an x86 arch install system

guisacouto commented 12 years ago

@pepedog

oh ok, will see how kernel_debug goes tomorrow. I hope it gives some clues