raspberrypi / linux

Kernel source tree for Raspberry Pi-provided kernel builds. Issues unrelated to the linux kernel should be posted on the community forum at https://forums.raspberrypi.com/
Other
11.2k stars 5.03k forks source link

"eth0: kevent 2 may have been dropped" is still here #309

Closed wrobelda closed 10 years ago

wrobelda commented 11 years ago

As per above, I am unfortunately suffering from this error for a long time now. No fix can make it vanish:

I have postponed with reporting this until I had possibility to test another Raspberry board so to make sure this is not a hardware that was malfunctioning. I also tested with different power source to no avil.

Also tried two days ago with recent fiq_split branch - same behaviour + extra mmc0 timeouts, but that's a different story.

Tested on: me@rpi ~ $ uname -a Linux rpi 3.6.11+ #456 PREEMPT Mon May 20 17:42:15 BST 2013 armv6l GNU/Linux me@rpi ~ $ /opt/vc/bin/vcgencmd version May 26 2013 21:47:02 Copyright (c) 2012 Broadcom version 53261d4ede3ba2b660e4201aca9bd4544565a3ce (clean) (release)

...and also on: root@rpi:/home/# uname -a Linux rpi 3.6.11+ #462 PREEMPT Mon Jun 3 22:15:00 BST 2013 armv6l GNU/Linux root@rpi:/home/# /opt/vc/bin/vcgencmd version Jun 3 2013 22:37:15 Copyright (c) 2012 Broadcom version 2bac1bb890aa545e180e3f0766fb67989b590d26 (clean) (release)

I am going to test it with some other USB Hub. But I would be surprised if the error were to be hub's fault, as the one I own is the one that's reported to be probably most compatible - Plugable 7 Port with external power amp.

wrobelda commented 11 years ago

Something interesting has shown up when I woke up this morning and checked dmesg:

[Wed Jun 5 03:08:08 2013] ieee80211 phy0: failed to reallocate TX buffer [Wed Jun 5 03:08:08 2013] ieee80211 phy0: failed to reallocate TX buffer [Wed Jun 5 03:08:09 2013] ieee80211 phy0: failed to reallocate TX buffer [Wed Jun 5 03:09:10 2013] ieee80211 phy0: failed to reallocate TX buffer [Wed Jun 5 03:10:35 2013] INFO: task scsi_eh_0:351 blocked for more than 120 seconds. [Wed Jun 5 03:10:35 2013] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Wed Jun 5 03:10:35 2013] scsi_eh_0 D c0398690 0 351 2 0x00000000 [Wed Jun 5 03:10:35 2013] [<c0398690>] (__schedule+0x2c4/0x5b0) from [<c03973d0>] (schedule_timeout+0x158/0x1e0) [Wed Jun 5 03:10:35 2013] [<c03973d0>] (schedule_timeout+0x158/0x1e0) from [<c0398b3c>] (wait_for_common+0xcc/0x198) [Wed Jun 5 03:10:35 2013] [<c0398b3c>] (wait_for_common+0xcc/0x198) from [<c02a1090>] (command_abort+0xa0/0xe8) [Wed Jun 5 03:10:35 2013] [<c02a1090>] (command_abort+0xa0/0xe8) from [<c0255ca0>] (scsi_error_handler+0x3a0/0x4b4) [Wed Jun 5 03:10:35 2013] [<c0255ca0>] (scsi_error_handler+0x3a0/0x4b4) from [<c003a7b4>] (kthread+0x88/0x94) [Wed Jun 5 03:10:35 2013] [<c003a7b4>] (kthread+0x88/0x94) from [<c000e9fc>] (kernel_thread_exit+0x0/0x8) [Wed Jun 5 03:10:35 2013] INFO: task usb-storage:352 blocked for more than 120 seconds. [Wed Jun 5 03:10:35 2013] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Wed Jun 5 03:10:35 2013] usb-storage D c0398690 0 352 2 0x00000000 [Wed Jun 5 03:10:35 2013] [<c0398690>] (__schedule+0x2c4/0x5b0) from [<c03973d0>] (schedule_timeout+0x158/0x1e0) [Wed Jun 5 03:10:35 2013] [<c03973d0>] (schedule_timeout+0x158/0x1e0) from [<c0398b3c>] (wait_for_common+0xcc/0x198) [Wed Jun 5 03:10:35 2013] [<c0398b3c>] (wait_for_common+0xcc/0x198) from [<c0275538>] (usb_sg_wait+0x144/0x198) [Wed Jun 5 03:10:35 2013] [<c0275538>] (usb_sg_wait+0x144/0x198) from [<c02a2040>] (usb_stor_bulk_transfer_sglist.part.4+0xa4/0xf8) [Wed Jun 5 03:10:35 2013] [<c02a2040>] (usb_stor_bulk_transfer_sglist.part.4+0xa4/0xf8) from [<c02a22e8>] (usb_stor_bulk_srb+0x48/0x50) [Wed Jun 5 03:10:35 2013] [<c02a22e8>] (usb_stor_bulk_srb+0x48/0x50) from [<c02a23fc>] (usb_stor_Bulk_transport+0x10c/0x2f8) [Wed Jun 5 03:10:35 2013] [<c02a23fc>] (usb_stor_Bulk_transport+0x10c/0x2f8) from [<c02a291c>] (usb_stor_invoke_transport+0x2c/0x4fc) [Wed Jun 5 03:10:35 2013] [<c02a291c>] (usb_stor_invoke_transport+0x2c/0x4fc) from [<c02a3d5c>] (usb_stor_control_thread+0x19c/0x28c) [Wed Jun 5 03:10:35 2013] [<c02a3d5c>] (usb_stor_control_thread+0x19c/0x28c) from [<c003a7b4>] (kthread+0x88/0x94) [Wed Jun 5 03:10:35 2013] [<c003a7b4>] (kthread+0x88/0x94) from [<c000e9fc>] (kernel_thread_exit+0x0/0x8) [Wed Jun 5 03:11:27 2013] ieee80211 phy0: failed to reallocate TX buffer [Wed Jun 5 03:11:28 2013] ieee80211 phy0: failed to reallocate TX buffer [Wed Jun 5 03:12:35 2013] INFO: task scsi_eh_0:351 blocked for more than 120 seconds. [Wed Jun 5 03:12:35 2013] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Wed Jun 5 03:12:35 2013] scsi_eh_0 D c0398690 0 351 2 0x00000000 [Wed Jun 5 03:12:35 2013] [<c0398690>] (__schedule+0x2c4/0x5b0) from [<c03973d0>] (schedule_timeout+0x158/0x1e0) [Wed Jun 5 03:12:35 2013] [<c03973d0>] (schedule_timeout+0x158/0x1e0) from [<c0398b3c>] (wait_for_common+0xcc/0x198) [Wed Jun 5 03:12:35 2013] [<c0398b3c>] (wait_for_common+0xcc/0x198) from [<c02a1090>] (command_abort+0xa0/0xe8) [Wed Jun 5 03:12:35 2013] [<c02a1090>] (command_abort+0xa0/0xe8) from [<c0255ca0>] (scsi_error_handler+0x3a0/0x4b4) [Wed Jun 5 03:12:35 2013] [<c0255ca0>] (scsi_error_handler+0x3a0/0x4b4) from [<c003a7b4>] (kthread+0x88/0x94) [Wed Jun 5 03:12:35 2013] [<c003a7b4>] (kthread+0x88/0x94) from [<c000e9fc>] (kernel_thread_exit+0x0/0x8) [Wed Jun 5 03:12:35 2013] INFO: task usb-storage:352 blocked for more than 120 seconds. [Wed Jun 5 03:12:35 2013] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Wed Jun 5 03:12:35 2013] usb-storage D c0398690 0 352 2 0x00000000 [Wed Jun 5 03:12:35 2013] [<c0398690>] (__schedule+0x2c4/0x5b0) from [<c03973d0>] (schedule_timeout+0x158/0x1e0) [Wed Jun 5 03:12:35 2013] [<c03973d0>] (schedule_timeout+0x158/0x1e0) from [<c0398b3c>] (wait_for_common+0xcc/0x198) [Wed Jun 5 03:12:35 2013] [<c0398b3c>] (wait_for_common+0xcc/0x198) from [<c0275538>] (usb_sg_wait+0x144/0x198) [Wed Jun 5 03:12:35 2013] [<c0275538>] (usb_sg_wait+0x144/0x198) from [<c02a2040>] (usb_stor_bulk_transfer_sglist.part.4+0xa4/0xf8) [Wed Jun 5 03:12:35 2013] [<c02a2040>] (usb_stor_bulk_transfer_sglist.part.4+0xa4/0xf8) from [<c02a22e8>] (usb_stor_bulk_srb+0x48/0x50) [Wed Jun 5 03:12:35 2013] [<c02a22e8>] (usb_stor_bulk_srb+0x48/0x50) from [<c02a23fc>] (usb_stor_Bulk_transport+0x10c/0x2f8) [Wed Jun 5 03:12:35 2013] [<c02a23fc>] (usb_stor_Bulk_transport+0x10c/0x2f8) from [<c02a291c>] (usb_stor_invoke_transport+0x2c/0x4fc) [Wed Jun 5 03:12:35 2013] [<c02a291c>] (usb_stor_invoke_transport+0x2c/0x4fc) from [<c02a3d5c>] (usb_stor_control_thread+0x19c/0x28c) [Wed Jun 5 03:12:35 2013] [<c02a3d5c>] (usb_stor_control_thread+0x19c/0x28c) from [<c003a7b4>] (kthread+0x88/0x94) [Wed Jun 5 03:12:35 2013] [<c003a7b4>] (kthread+0x88/0x94) from [<c000e9fc>] (kernel_thread_exit+0x0/0x8) [Wed Jun 5 03:14:35 2013] INFO: task scsi_eh_0:351 blocked for more than 120 seconds. [Wed Jun 5 03:14:35 2013] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Wed Jun 5 03:14:35 2013] scsi_eh_0 D c0398690 0 351 2 0x00000000 [Wed Jun 5 03:14:35 2013] [<c0398690>] (__schedule+0x2c4/0x5b0) from [<c03973d0>] (schedule_timeout+0x158/0x1e0) [Wed Jun 5 03:14:35 2013] [<c03973d0>] (schedule_timeout+0x158/0x1e0) from [<c0398b3c>] (wait_for_common+0xcc/0x198) [Wed Jun 5 03:14:35 2013] [<c0398b3c>] (wait_for_common+0xcc/0x198) from [<c02a1090>] (command_abort+0xa0/0xe8) [Wed Jun 5 03:14:35 2013] [<c02a1090>] (command_abort+0xa0/0xe8) from [<c0255ca0>] (scsi_error_handler+0x3a0/0x4b4) [Wed Jun 5 03:14:35 2013] [<c0255ca0>] (scsi_error_handler+0x3a0/0x4b4) from [<c003a7b4>] (kthread+0x88/0x94) [Wed Jun 5 03:14:35 2013] [<c003a7b4>] (kthread+0x88/0x94) from [<c000e9fc>] (kernel_thread_exit+0x0/0x8) [Wed Jun 5 03:14:35 2013] INFO: task usb-storage:352 blocked for more than 120 seconds. [Wed Jun 5 03:14:35 2013] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Wed Jun 5 03:14:35 2013] usb-storage D c0398690 0 352 2 0x00000000 [Wed Jun 5 03:14:35 2013] [<c0398690>] (__schedule+0x2c4/0x5b0) from [<c03973d0>] (schedule_timeout+0x158/0x1e0) [Wed Jun 5 03:14:35 2013] [<c03973d0>] (schedule_timeout+0x158/0x1e0) from [<c0398b3c>] (wait_for_common+0xcc/0x198) [Wed Jun 5 03:14:35 2013] [<c0398b3c>] (wait_for_common+0xcc/0x198) from [<c0275538>] (usb_sg_wait+0x144/0x198) [Wed Jun 5 03:14:35 2013] [<c0275538>] (usb_sg_wait+0x144/0x198) from [<c02a2040>] (usb_stor_bulk_transfer_sglist.part.4+0xa4/0xf8) [Wed Jun 5 03:14:35 2013] [<c02a2040>] (usb_stor_bulk_transfer_sglist.part.4+0xa4/0xf8) from [<c02a22e8>] (usb_stor_bulk_srb+0x48/0x50) [Wed Jun 5 03:14:35 2013] [<c02a22e8>] (usb_stor_bulk_srb+0x48/0x50) from [<c02a23fc>] (usb_stor_Bulk_transport+0x10c/0x2f8) [Wed Jun 5 03:14:35 2013] [<c02a23fc>] (usb_stor_Bulk_transport+0x10c/0x2f8) from [<c02a291c>] (usb_stor_invoke_transport+0x2c/0x4fc) [Wed Jun 5 03:14:35 2013] [<c02a291c>] (usb_stor_invoke_transport+0x2c/0x4fc) from [<c02a3d5c>] (usb_stor_control_thread+0x19c/0x28c) [Wed Jun 5 03:14:35 2013] [<c02a3d5c>] (usb_stor_control_thread+0x19c/0x28c) from [<c003a7b4>] (kthread+0x88/0x94) [Wed Jun 5 03:14:35 2013] [<c003a7b4>] (kthread+0x88/0x94) from [<c000e9fc>] (kernel_thread_exit+0x0/0x8) [Wed Jun 5 03:16:35 2013] INFO: task scsi_eh_0:351 blocked for more than 120 seconds. [Wed Jun 5 03:16:35 2013] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Wed Jun 5 03:16:35 2013] scsi_eh_0 D c0398690 0 351 2 0x00000000 [Wed Jun 5 03:16:35 2013] [<c0398690>] (__schedule+0x2c4/0x5b0) from [<c03973d0>] (schedule_timeout+0x158/0x1e0) [Wed Jun 5 03:16:35 2013] [<c03973d0>] (schedule_timeout+0x158/0x1e0) from [<c0398b3c>] (wait_for_common+0xcc/0x198) [Wed Jun 5 03:16:35 2013] [<c0398b3c>] (wait_for_common+0xcc/0x198) from [<c02a1090>] (command_abort+0xa0/0xe8) [Wed Jun 5 03:16:35 2013] [<c02a1090>] (command_abort+0xa0/0xe8) from [<c0255ca0>] (scsi_error_handler+0x3a0/0x4b4) [Wed Jun 5 03:16:35 2013] [<c0255ca0>] (scsi_error_handler+0x3a0/0x4b4) from [<c003a7b4>] (kthread+0x88/0x94) [Wed Jun 5 03:16:35 2013] [<c003a7b4>] (kthread+0x88/0x94) from [<c000e9fc>] (kernel_thread_exit+0x0/0x8) [Wed Jun 5 03:16:35 2013] INFO: task usb-storage:352 blocked for more than 120 seconds. [Wed Jun 5 03:16:35 2013] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Wed Jun 5 03:16:35 2013] usb-storage D c0398690 0 352 2 0x00000000 [Wed Jun 5 03:16:35 2013] [<c0398690>] (__schedule+0x2c4/0x5b0) from [<c03973d0>] (schedule_timeout+0x158/0x1e0) [Wed Jun 5 03:16:35 2013] [<c03973d0>] (schedule_timeout+0x158/0x1e0) from [<c0398b3c>] (wait_for_common+0xcc/0x198) [Wed Jun 5 03:16:35 2013] [<c0398b3c>] (wait_for_common+0xcc/0x198) from [<c0275538>] (usb_sg_wait+0x144/0x198) [Wed Jun 5 03:16:35 2013] [<c0275538>] (usb_sg_wait+0x144/0x198) from [<c02a2040>] (usb_stor_bulk_transfer_sglist.part.4+0xa4/0xf8) [Wed Jun 5 03:16:35 2013] [<c02a2040>] (usb_stor_bulk_transfer_sglist.part.4+0xa4/0xf8) from [<c02a22e8>] (usb_stor_bulk_srb+0x48/0x50) [Wed Jun 5 03:16:35 2013] [<c02a22e8>] (usb_stor_bulk_srb+0x48/0x50) from [<c02a23fc>] (usb_stor_Bulk_transport+0x10c/0x2f8) [Wed Jun 5 03:16:35 2013] [<c02a23fc>] (usb_stor_Bulk_transport+0x10c/0x2f8) from [<c02a291c>] (usb_stor_invoke_transport+0x2c/0x4fc) [Wed Jun 5 03:16:35 2013] [<c02a291c>] (usb_stor_invoke_transport+0x2c/0x4fc) from [<c02a3d5c>] (usb_stor_control_thread+0x19c/0x28c) [Wed Jun 5 03:16:35 2013] [<c02a3d5c>] (usb_stor_control_thread+0x19c/0x28c) from [<c003a7b4>] (kthread+0x88/0x94) [Wed Jun 5 03:16:35 2013] [<c003a7b4>] (kthread+0x88/0x94) from [<c000e9fc>] (kernel_thread_exit+0x0/0x8) [Wed Jun 5 03:18:35 2013] INFO: task scsi_eh_0:351 blocked for more than 120 seconds. [Wed Jun 5 03:18:35 2013] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Wed Jun 5 03:18:35 2013] scsi_eh_0 D c0398690 0 351 2 0x00000000 [Wed Jun 5 03:18:35 2013] [<c0398690>] (__schedule+0x2c4/0x5b0) from [<c03973d0>] (schedule_timeout+0x158/0x1e0) [Wed Jun 5 03:18:35 2013] [<c03973d0>] (schedule_timeout+0x158/0x1e0) from [<c0398b3c>] (wait_for_common+0xcc/0x198) [Wed Jun 5 03:18:35 2013] [<c0398b3c>] (wait_for_common+0xcc/0x198) from [<c02a1090>] (command_abort+0xa0/0xe8) [Wed Jun 5 03:18:35 2013] [<c02a1090>] (command_abort+0xa0/0xe8) from [<c0255ca0>] (scsi_error_handler+0x3a0/0x4b4) [Wed Jun 5 03:18:35 2013] [<c0255ca0>] (scsi_error_handler+0x3a0/0x4b4) from [<c003a7b4>] (kthread+0x88/0x94) [Wed Jun 5 03:18:35 2013] [<c003a7b4>] (kthread+0x88/0x94) from [<c000e9fc>] (kernel_thread_exit+0x0/0x8) [Wed Jun 5 03:18:35 2013] INFO: task usb-storage:352 blocked for more than 120 seconds. [Wed Jun 5 03:18:35 2013] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Wed Jun 5 03:18:35 2013] usb-storage D c0398690 0 352 2 0x00000000 [Wed Jun 5 03:18:35 2013] [<c0398690>] (__schedule+0x2c4/0x5b0) from [<c03973d0>] (schedule_timeout+0x158/0x1e0) [Wed Jun 5 03:18:35 2013] [<c03973d0>] (schedule_timeout+0x158/0x1e0) from [<c0398b3c>] (wait_for_common+0xcc/0x198) [Wed Jun 5 03:18:35 2013] [<c0398b3c>] (wait_for_common+0xcc/0x198) from [<c0275538>] (usb_sg_wait+0x144/0x198) [Wed Jun 5 03:18:35 2013] [<c0275538>] (usb_sg_wait+0x144/0x198) from [<c02a2040>] (usb_stor_bulk_transfer_sglist.part.4+0xa4/0xf8) [Wed Jun 5 03:18:35 2013] [<c02a2040>] (usb_stor_bulk_transfer_sglist.part.4+0xa4/0xf8) from [<c02a22e8>] (usb_stor_bulk_srb+0x48/0x50) [Wed Jun 5 03:18:35 2013] [<c02a22e8>] (usb_stor_bulk_srb+0x48/0x50) from [<c02a23fc>] (usb_stor_Bulk_transport+0x10c/0x2f8) [Wed Jun 5 03:18:35 2013] [<c02a23fc>] (usb_stor_Bulk_transport+0x10c/0x2f8) from [<c02a291c>] (usb_stor_invoke_transport+0x2c/0x4fc) [Wed Jun 5 03:18:35 2013] [<c02a291c>] (usb_stor_invoke_transport+0x2c/0x4fc) from [<c02a3d5c>] (usb_stor_control_thread+0x19c/0x28c) [Wed Jun 5 03:18:35 2013] [<c02a3d5c>] (usb_stor_control_thread+0x19c/0x28c) from [<c003a7b4>] (kthread+0x88/0x94) [Wed Jun 5 03:18:35 2013] [<c003a7b4>] (kthread+0x88/0x94) from [<c000e9fc>] (kernel_thread_exit+0x0/0x8) [Wed Jun 5 06:25:52 2013] ieee80211 phy0: failed to reallocate TX buffer [Wed Jun 5 06:25:55 2013] ieee80211 phy0: failed to reallocate TX buffer [Wed Jun 5 06:26:06 2013] ieee80211 phy0: failed to reallocate TX buffer

popcornmix commented 11 years ago

Can you run the same test with wired ethernet? or a different wifi dongle? Looks a bit like a memory leak (or an inappropraitely sized buffer) in the network stack that eventually brings the system down.

If you limit the download speed of the torrent application, does it run for longer? Does it run forever?

wrobelda commented 11 years ago

The wired ethernet is a WAN gateway and is connected at all times and so is wifi, which works in AP mode driven by hostapd. I am going to try with another wifi dongle, but will have to find one that supports AP well which will take some days.

The torrent application actually almost never reaches more than 10Mbps, but for the sake of tests I will limit it to e.g. 5Mbps and see.

I can't back it up with any evidence, but being myself a programmer it seems to me by intuition that this is indeed some memory leak.

popcornmix commented 11 years ago

So the Pi is acting as an wireless access point for other machines. And the torrent app is run on another machine - not the pi?

wrobelda commented 11 years ago

Correct, Pi is wireless access point BUT the torrent app runs ON the Pi at the same time.

wrobelda commented 11 years ago

I did lower the torrent throughput to roughly 8 mbps and it still hangs after at most one day. The only time I can enjoy long uptimes is when I disable torrent at all.

This error has been here pretty much ever since the beginning of The Life of PI. Is there anything you can do about it or is this non-fixable and so I should probably go and return my PIs claiming they are not as advertised?

I am sorry to say that but this is what it looks like to me. The USB 2.0 subsystem is not usable.

popcornmix commented 11 years ago

It seems to be a memory allocation error in network stack. Have you got 256M or 512M Pi? Running the cutdown firmware (gpu_mem=16)?

I would have thought increasing vm.min_free_kilobytes is the most likely solution.

What torrent client? It may be worth trying to find a lighter one. Is it better if torrent client is run on another machine on network (but still using Pi as AP).

licaon-kter commented 11 years ago

wait, this is a bug now? I get this all the time, and this Pi only has minimal network activity ( ssh connection for just one user, and sometimes some git updates ) I already have:

smc95xx.turbo_mode=N in /boot/cmdline.txt vm.min_free_kbytes = 8192 in /etc/sysctl.d/local.conf

wrobelda commented 11 years ago

@licaon-kter had you abused the eth it a bit more, it would most likely hang after a period of time

@popcornmix Have 512. Not sure if that's what you mean by the cutdown firmware, but my gpu_mem is set to 128MB, although nothing is really making use of it. Have tried with min_free_kilobytes up to 128MB (sic!), yet still would hang after pretty much same period of time (hard to tell, always around a day) Tried with both transmission and rtorrent. I will see if it changes if I download on a machine inside the network.

Ferroin commented 11 years ago

I think that adjusting vm.min_free_kbytes is just putting an (ugly) bandage on the problem. If the cause is indeed a memory leak, (which sounds like the most probable cause to me), the system will evenutually reach an OOM condition (this may take a long time if you have a lot of swap), raising vm.min_free_kbytes will just make the system thrash more as it approaches vm exhaustion.

nickriordan commented 11 years ago

I may be seeing this too. Have a pi running afpd and am copying around several hundred GB of files over Ethernet from my Mac to a locally attached USB WD hard drive. Everything runs smoothly for the first 50-70GB and then the Pi crashes. Only messages that I can see in any of the logs is "eth0: kevent 2 may have been dropped". Also I have disabled swap (silly idea - very slow on SD card and wears the card out quickly). 512MB Pi. Video split set to 32MB leaving around 480MB free for the Pi. Running as a server.

wrobelda commented 11 years ago

I did some more experiments. I powered the Pi with external 1.5 amps USB power source instead of from the HUB and also connected the WiFi card to Pi directly. The second USB port is still used by USB hub that in turns runs HDD only.

It seems more stable now with the uptime of 12 hours while constantly downloading some linux ISO images off the torrent. After having set the min_free_kbytes to 32M, there are also no errors in dmesg. I will do some more testing with this configuration and then see if it breaks after I switch the WIFi to HUB. Its either crappy WiFi or the HUB's unreliable power source that cause these hangups. However given how good opinion the HUB I own has, I would expect that it's the WiFi that does not like the HUB.

wrobelda commented 11 years ago

I am afraid this bug is still here. Raspberry still hangs after as little as an hour.

ZeBadger commented 11 years ago

I have this problem with 2013-07-26-wheezy-raspbian on a 256mb pi model b

If I run the command "git clone http://github.com/raspberrypi/linux.git kernel" it works until about 14% then it just exits with no error messages (exit code 128)

I get "kevent 2 may have been dropped" in dmesg

Changing the vm.min_free_kbytes to 32768 and disabling turbo mode helps a little, but I've still not managed to complete that command.. although last time I was running X/Windows and when I tried to to "ls -la" I got an out of memory error.

licaon-kter commented 11 years ago

try to update first, maybe to the next branch via sudo BRANCH=next rpi-update as I haven't seen this for a while also, github has failed for me in the past as well https://github.com/Hexxeh/rpi-firmware/issues/7

P33M commented 10 years ago

CONFIG_COMPACTION and CONFIG_SLUB have fixed this in all but pathological cases.

fedeaf commented 10 years ago

IMO, this is not fixed at all. I keep getting this error consistently on the latest firmware with smc95xx.turbo_mode=N and vm.min_free_kbytes = 32768. This is on a headless 512MB raspi with gpu_mem = 16. @P33M Please, reopen the ticket until it's definitely fixed.

P33M commented 10 years ago

Please post complete logs.

fedeaf commented 10 years ago

I've just taken the dmesg output here: https://gist.github.com/fedeaf/4f1ffef2c1f09aa422c3

ATM, deluge is downloading one torrent at ~400KBps and I was streaming through samba a video with an average of 350KB/s to my laptop.

P33M commented 10 years ago

Is there any indication that the functionality when these messages are being generated is impaired? I.e. do you see skipped video or corrupt fragment downloads?

fedeaf commented 10 years ago

Yes, both are true. Video stutter is common and downloads are usually damaged. $ free -h total used free shared buffers cached Mem: 485M 468M 16M 528K 140M 234M -/+ buffers/cache: 93M 391M Swap: 989M 0B 989M

P33M commented 10 years ago

Can you post the output of cat /proc/buddyinfo (over several times when the network is active)?

fedeaf commented 10 years ago

Here is buddyinfo every 3 secs, while streaming a video and deluge also running: https://gist.github.com/fedeaf/84313d2a0a29d01df2b5

Just before that I drop_caches, that's why there was some free memory at the start.

P33M commented 10 years ago

Normal downloads may end up corrupt if there is a smsc95xx driver bug. If it doesn't correctly report broken transfers in the case of a memory allocation failure, then that's an upstream bug. As we are pretty much the only hardcore users of this hardware then it'll be up to us to diagnose and fix anyway.

Torrents use chunk hash checks to ensure the validity of data transferred. Are you seeing corrupt torrent downloads?

Is the problem worse if you use smsc95xx.turbo_mode=Y?

fedeaf commented 10 years ago

I've seen corrupted torrents but that hasn't happened for sometime now, might have been related to something else. I should test more but smsc95xx.turbo_mode seems to have no effect on this (at least for me).

Without any torrents downloading, I can happily stream a video through samba to my laptop and have no stutter even if I see some kevent2 on the logs. While I'm downloading something (even at low speeds ~ 300KB/s) streaming becomes totally impossible after some time. It seems to be related to writing to the USB hdd (ntfs).

Could this cause corruption at the time dumping data from ram to usb-hdd?

P33M commented 10 years ago

You are running out of CPU time.

At the least, reformat the external drive as ext4. This will reduce the CPU time on file access substantially.

If you get corrupt torrent downloads without streaming and it appears dependent on "kevent 2" messages, then it warrants further investigation.

fedeaf commented 10 years ago

No, my raspi is OCed and has plenty of CPU available while doing that. I also have kevent 2s even if there are no torrents or writes to the disk and I'm just streaming through samba. I'm pretty sure that the issue is not related to CPU load. Will try to prove it somehow and hopefully getting samba out of the way too.

JamesH65 commented 10 years ago

Whilst doing the above, run 'top' in a terminal to see the CPU usage. I think that even when overclocked those operations will use a LOT of CPU.

Tuinslak commented 8 years ago

Same is happening on an RPI2. I have 2 shares mounted as NFS (ext4) and two other servers running rdiff-backup to the Pi (stores the files via NFS).

I have plenty of CPU time left, according to htop. RPI slightly overclocked.

screen shot 2016-02-07 at 20 51 32 screen shot 2016-02-07 at 20 51 55
CRTX commented 8 years ago

Why is this closed? This is still happening in my Raspberry Pi 3 with latest firmware 4.1.20-v7+ except with "kevent 0 may have been dropped"

ZeBadger commented 8 years ago

I think the problem was resolved by switching to a different SD card.

MrColdbird commented 8 years ago

Problem is still there, but I fear this issue rests elsewhere and not with the Raspberry pi.

The same issue occurs on regular desktop linux distributions (x86 / x86-64) whenever a network interface is hammered with a huge amount of concurrent connections.

Not only is my Raspberry pi affected, but so is my x86-64 based homemade linux router. This is most likely a kernel issue of some kind, but given the huge time that has passed since its first discovery, and the fact that there still isn't a fix out, makes me assume that this issue doesn't rank high in the todo list of the linux kernel developers (which makes me sad).