solarpanzer / boblight

Automatically exported from code.google.com/p/boblight
GNU General Public License v3.0
0 stars 0 forks source link

boblightd seems to crash from time to time, ends up as zombie #51

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Run boblightd -f
2. Use it a longer period of time
3. Client suddenly seems to loose connection to boblight (the ambilight effect 
resets to the controllers standard program as one can see with the connected 
led stripes), trying to kill -15 boblightd ends up in a zombie process
4. System seems to be crashed partially (the usb subsystem?), video/audio works 
but one can not use any keyboard/mouse (USB?!) any more
4. SSH becomes unuseable, one can connect and login but then there is no 
response, no prompt at all
5. One has to hard reset the entire system

What version of the product are you using? On what operating system?
Compiled from current git at Ubuntu 11.04

No Log output in Syslog or boblight.log .... but dmesg gave this:
[  240.600108] INFO: task boblightd:1139 blocked for more than 120 seconds.
[  240.600118] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  240.600125] boblightd       D 0000000000000001     0  1139      1 0x00000004
[  240.600137]  ffff880071c15a28 0000000000000046 ffff880071c15fd8 
ffff880071c14000
[  240.600146]  0000000000013d00 ffff880070d2df38 ffff880071c15fd8 
0000000000013d00
[  240.600156]  ffff880051af2dc0 ffff880070d2db80 0000000000000282 
ffff88007143f240
[  240.600165] Call Trace:
[  240.600184]  [<ffffffff8144043d>] usb_kill_urb+0x8d/0xd0
[  240.600195]  [<ffffffff810880e0>] ? autoremove_wake_function+0x0/0x40
[  240.600225]  [<ffffffffa010df21>] generic_cleanup+0x51/0xe0 [usbserial]
[  240.600240]  [<ffffffffa010e055>] usb_serial_generic_close+0x25/0x60 
[usbserial]
[  240.600253]  [<ffffffffa00d7afa>] ftdi_close+0x3a/0x80 [ftdi_sio]
[  240.600264]  [<ffffffffa010b02c>] serial_down+0x2c/0x30 [usbserial]
[  240.600273]  [<ffffffff8138b742>] tty_port_shutdown+0x62/0x70
[  240.600280]  [<ffffffff8138bf8d>] tty_port_close+0x2d/0x60
[  240.600292]  [<ffffffffa010bd12>] serial_close+0x42/0x80 [usbserial]
[  240.600301]  [<ffffffff81383a16>] tty_release+0x166/0x5c0
[  240.600312]  [<ffffffff811790cf>] ? __d_free+0x4f/0x70
[  240.600322]  [<ffffffff8116656e>] __fput+0xbe/0x200
[  240.600329]  [<ffffffff811666d5>] fput+0x25/0x30
[  240.600336]  [<ffffffff81162fb0>] filp_close+0x60/0x90
[  240.600346]  [<ffffffff81069e98>] put_files_struct+0x88/0xf0
[  240.600354]  [<ffffffff81069fc4>] exit_files+0x54/0x70
[  240.600362]  [<ffffffff8106a4b5>] do_exit+0x175/0x410
[  240.600371]  [<ffffffff8106a908>] do_group_exit+0x58/0xd0
[  240.600379]  [<ffffffff8107ac57>] get_signal_to_deliver+0x247/0x410
[  240.600388]  [<ffffffff8100b936>] do_signal+0x56/0x180
[  240.600396]  [<ffffffff81093011>] ? ktime_get_ts+0xb1/0xf0
[  240.600405]  [<ffffffff81177652>] ? poll_select_copy_remaining+0xf2/0x140
[  240.600413]  [<ffffffff8100bae5>] do_notify_resume+0x65/0x80
[  240.600420]  [<ffffffff8100c2d0>] int_signal+0x12/0x17
[  325.304344] usb 3-2: USB disconnect, address 2
[  360.600141] INFO: task boblightd:1139 blocked for more than 120 seconds.
[  360.600152] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  360.600160] boblightd       D 0000000000000001     0  1139      1 0x00000004
[  360.600172]  ffff880071c15a28 0000000000000046 ffff880071c15fd8 
ffff880071c14000
[  360.600182]  0000000000013d00 ffff880070d2df38 ffff880071c15fd8 
0000000000013d00
[  360.600192]  ffff880051af2dc0 ffff880070d2db80 0000000000000282 
ffff88007143f240
[  360.600201] Call Trace:
[  360.600226]  [<ffffffff8144043d>] usb_kill_urb+0x8d/0xd0
[  360.600240]  [<ffffffff810880e0>] ? autoremove_wake_function+0x0/0x40
[  360.600277]  [<ffffffffa010df21>] generic_cleanup+0x51/0xe0 [usbserial]
[  360.600291]  [<ffffffffa010e055>] usb_serial_generic_close+0x25/0x60 
[usbserial]
[  360.600304]  [<ffffffffa00d7afa>] ftdi_close+0x3a/0x80 [ftdi_sio]
[  360.600316]  [<ffffffffa010b02c>] serial_down+0x2c/0x30 [usbserial]
[  360.600325]  [<ffffffff8138b742>] tty_port_shutdown+0x62/0x70
[  360.600332]  [<ffffffff8138bf8d>] tty_port_close+0x2d/0x60
[  360.600344]  [<ffffffffa010bd12>] serial_close+0x42/0x80 [usbserial]
[  360.600355]  [<ffffffff81383a16>] tty_release+0x166/0x5c0
[  360.600367]  [<ffffffff811790cf>] ? __d_free+0x4f/0x70
[  360.600378]  [<ffffffff8116656e>] __fput+0xbe/0x200
[  360.600385]  [<ffffffff811666d5>] fput+0x25/0x30
[  360.600393]  [<ffffffff81162fb0>] filp_close+0x60/0x90
[  360.600404]  [<ffffffff81069e98>] put_files_struct+0x88/0xf0
[  360.600412]  [<ffffffff81069fc4>] exit_files+0x54/0x70
[  360.600420]  [<ffffffff8106a4b5>] do_exit+0x175/0x410
[  360.600429]  [<ffffffff8106a908>] do_group_exit+0x58/0xd0
[  360.600437]  [<ffffffff8107ac57>] get_signal_to_deliver+0x247/0x410
[  360.600447]  [<ffffffff8100b936>] do_signal+0x56/0x180
[  360.600456]  [<ffffffff81093011>] ? ktime_get_ts+0xb1/0xf0
[  360.600466]  [<ffffffff81177652>] ? poll_select_copy_remaining+0xf2/0x140
[  360.600473]  [<ffffffff8100bae5>] do_notify_resume+0x65/0x80
[  360.600481]  [<ffffffff8100c2d0>] int_signal+0x12/0x17
[  360.600531] INFO: task Xorg:2166 blocked for more than 120 seconds.
[  360.600536] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  360.600541] Xorg            D 0000000000000001     0  2166   2163 0x00400004
[  360.600551]  ffff880066f5bcb8 0000000000000082 ffff880066f5bfd8 
ffff880066f5a000
[  360.600561]  0000000000013d00 ffff880072613178 ffff880066f5bfd8 
0000000000013d00
[  360.600569]  ffff880073230000 ffff880072612dc0 ffff880000000000 
ffffffff81a62fc0
[  360.600578] Call Trace:
[  360.600588]  [<ffffffff815c8f57>] __mutex_lock_slowpath+0xf7/0x180
[  360.600596]  [<ffffffff815c89ab>] mutex_lock+0x2b/0x50
[  360.600605]  [<ffffffff815ca62c>] tty_lock+0x2c/0x4d
[  360.600613]  [<ffffffff8138eaed>] vt_ioctl+0x4d/0x1e10
[  360.600622]  [<ffffffff811782c4>] ? core_sys_select+0x204/0x330
[  360.600631]  [<ffffffff8138b5d3>] ? tty_buffer_flush+0x103/0x110
[  360.600640]  [<ffffffff8138480b>] tty_ioctl+0x24b/0x9f0
[  360.600650]  [<ffffffff8146cf60>] ? evdev_read+0x110/0x2a0
[  360.600658]  [<ffffffff81015701>] ? fpu_finit+0x21/0x40
[  360.600665]  [<ffffffff8101582a>] ? init_fpu+0x4a/0x150
[  360.600672]  [<ffffffff81016aa0>] ? restore_i387_xstate+0x110/0x140
[  360.600680]  [<ffffffff8117688f>] do_vfs_ioctl+0x8f/0x360
[  360.600687]  [<ffffffff8100bd06>] ? sys_rt_sigreturn+0x106/0x140
[  360.600695]  [<ffffffff81176bf1>] sys_ioctl+0x91/0xa0
[  360.600703]  [<ffffffff8100c002>] system_call_fastpath+0x16/0x1b
[  360.600760] INFO: task less:3065 blocked for more than 120 seconds.
[  360.600765] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  360.600770] less            D 0000000000000001     0  3065   2781 0x00000000
[  360.600779]  ffff8800368afb48 0000000000000082 ffff8800368affd8 
ffff8800368ae000
[  360.600788]  0000000000013d00 ffff880066ee3178 ffff8800368affd8 
0000000000013d00
[  360.600796]  ffff880073230000 ffff880066ee2dc0 ffff880077f8cc08 
ffffffff81a62fc0
[  360.600805] Call Trace:
[  360.600812]  [<ffffffff815c8f57>] __mutex_lock_slowpath+0xf7/0x180
[  360.600820]  [<ffffffff815c89ab>] mutex_lock+0x2b/0x50
[  360.600827]  [<ffffffff815ca62c>] tty_lock+0x2c/0x4d
[  360.600834]  [<ffffffff81383ecc>] tty_open+0x5c/0x4c0
[  360.600844]  [<ffffffff8116898d>] ? cdev_get+0x2d/0xb0
[  360.600853]  [<ffffffff8116930a>] chrdev_open+0xda/0x1f0
[  360.600861]  [<ffffffff81169230>] ? chrdev_open+0x0/0x1f0
[  360.600868]  [<ffffffff811630ae>] __dentry_open+0xce/0x2f0
[  360.600877]  [<ffffffff8116f2f3>] ? generic_permission+0x23/0xc0
[  360.600885]  [<ffffffff811645a1>] nameidata_to_filp+0x71/0x80
[  360.600893]  [<ffffffff811737c8>] finish_open+0xc8/0x1b0
[  360.600901]  [<ffffffff811729d7>] ? do_path_lookup+0x87/0x160
[  360.600908]  [<ffffffff81173f88>] do_filp_open+0x2c8/0x7c0
[  360.600919]  [<ffffffff81134be8>] ? unmap_region+0x158/0x170
[  360.600930]  [<ffffffff812e74b7>] ? __strncpy_from_user+0x27/0x60
[  360.600940]  [<ffffffff811812b7>] ? alloc_fd+0xf7/0x150
[  360.600948]  [<ffffffff8116461a>] do_sys_open+0x6a/0x150
[  360.600955]  [<ffffffff81164720>] sys_open+0x20/0x30
[  360.600962]  [<ffffffff8100c002>] system_call_fastpath+0x16/0x1b

Original issue reported on code.google.com by borin1...@googlemail.com on 11 Oct 2012 at 9:15

GoogleCodeExporter commented 8 years ago
Looks like a kernel issue to me, boblightd just opens a socket for clients and 
writes its output to the serial port, it shouldn't have this much influence on 
the rest of the system even if it would freeze.

Original comment by bob.loo...@gmail.com on 11 Oct 2012 at 10:29

GoogleCodeExporter commented 8 years ago
OK, now I tried the following:

I completely reinstalled Ubuntu 12.04.1 (minimal), which currently uses Linux 
3.2.0-32-generic #51-Ubuntu SMP 

The periodical crashes seems to be gone, but now boblightd will not exit when 
sending sigint. Here is the last debug-output:

(SignalHandler)                 caught SIGINT
(CClientsHandler::GetReadableFd)select() Interrupted system call
(main)                          signaling devices to stop
(CClientsHandler::Cleanup)      disconnecting clients
(CClientsHandler::Cleanup)      closing listening socket
(CClientsHandler::Cleanup)      clients handler stopped
(main)                          waiting for devices to stop
/dev/ttyUSB0 write: 5a b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 a5

This can now be found in Syslog:

Oct 28 16:29:28 hostname kernel: [243360.480133] INFO: task boblightd:2176 
blocked for more than 120 seconds.
Oct 28 16:29:28 hostname kernel: [243360.480142] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 28 16:29:28 hostname kernel: [243360.480149] boblightd       D 
ffffffff81806200     0  2176   2165 0x00000000
Oct 28 16:29:28 hostname kernel: [243360.480161]  ffff880067617c98 
0000000000000082 ffff880067617c48 ffffffff8130d547
Oct 28 16:29:28 hostname kernel: [243360.480173]  ffff880067617fd8 
ffff880067617fd8 ffff880067617fd8 00000000000137c0
Oct 28 16:29:28 hostname kernel: [243360.480184]  ffff880074331700 
ffff8800725b0000 ffff880067617ca8 ffff8800733d6600
Oct 28 16:29:28 hostname kernel: [243360.480193] Call Trace:
Oct 28 16:29:28 hostname kernel: [243360.480210]  [<ffffffff8130d547>] ? 
kobject_put+0x27/0x60
Oct 28 16:29:28 hostname kernel: [243360.480222]  [<ffffffff81658f2f>] 
schedule+0x3f/0x60
Oct 28 16:29:28 hostname kernel: [243360.480231]  [<ffffffff81494ffd>] 
usb_kill_urb.part.4+0x5d/0xa0
Oct 28 16:29:28 hostname kernel: [243360.480241]  [<ffffffff8108ab80>] ? 
add_wait_queue+0x60/0x60
Oct 28 16:29:28 hostname kernel: [243360.480248]  [<ffffffff81495071>] 
usb_kill_urb+0x31/0x40
Oct 28 16:29:28 hostname kernel: [243360.480273]  [<ffffffffa0120f20>] 
generic_cleanup+0x50/0xe0 [usbserial]
Oct 28 16:29:28 hostname kernel: [243360.480288]  [<ffffffffa0121053>] 
usb_serial_generic_close+0x23/0x50 [usbserial]
Oct 28 16:29:28 hostname kernel: [243360.480301]  [<ffffffffa013b259>] 
ftdi_close+0x39/0x70 [ftdi_sio]
Oct 28 16:29:28 hostname kernel: [243360.480313]  [<ffffffffa011e029>] 
serial_down+0x29/0x30 [usbserial]
Oct 28 16:29:28 hostname kernel: [243360.480322]  [<ffffffff813c3f02>] 
tty_port_shutdown+0x62/0x70
Oct 28 16:29:28 hostname kernel: [243360.480330]  [<ffffffff813c4a0c>] 
tty_port_close+0x2c/0x60
Oct 28 16:29:28 hostname kernel: [243360.480342]  [<ffffffffa011ecc2>] 
serial_close+0x42/0x80 [usbserial]
Oct 28 16:29:28 hostname kernel: [243360.480352]  [<ffffffff813bc016>] 
tty_release+0x166/0x5c0
Oct 28 16:29:28 hostname kernel: [243360.480361]  [<ffffffff8104df53>] ? 
__wake_up+0x53/0x70
Oct 28 16:29:28 hostname kernel: [243360.480369]  [<ffffffff81055fad>] ? 
set_next_entity+0xad/0xd0
Oct 28 16:29:28 hostname kernel: [243360.480378]  [<ffffffff811794ee>] 
__fput+0xbe/0x210
Oct 28 16:29:28 hostname kernel: [243360.480385]  [<ffffffff81179665>] 
fput+0x25/0x30
Oct 28 16:29:28 hostname kernel: [243360.480392]  [<ffffffff81176206>] 
filp_close+0x66/0x90
Oct 28 16:29:28 hostname kernel: [243360.480399]  [<ffffffff811762e2>] 
sys_close+0xb2/0x120
Oct 28 16:29:28 hostname kernel: [243360.480407]  [<ffffffff81663442>] 
system_call_fastpath+0x16/0x1b
Oct 28 16:29:28 hostname kernel: [243360.480420] INFO: task sshd:4034 blocked 
for more than 120 seconds.
Oct 28 16:29:28 hostname kernel: [243360.480425] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 28 16:29:28 hostname kernel: [243360.480431] sshd            D 
ffffffff81806200     0  4034    994 0x00000000
Oct 28 16:29:28 hostname kernel: [243360.480441]  ffff88002b43fb28 
0000000000000082 ffffffff8118af00 00000000000000db
Oct 28 16:29:28 hostname kernel: [243360.480451]  ffff88002b43ffd8 
ffff88002b43ffd8 ffff88002b43ffd8 00000000000137c0
Oct 28 16:29:28 hostname kernel: [243360.480461]  ffffffff81c0d020 
ffff88006017dc00 ffff880000000000 ffffffff81c726e0
Oct 28 16:29:28 hostname kernel: [243360.480469] Call Trace:
Oct 28 16:29:28 hostname kernel: [243360.480478]  [<ffffffff8118af00>] ? 
poll_freewait+0xe0/0xe0
Oct 28 16:29:28 hostname kernel: [243360.480488]  [<ffffffff81658f2f>] 
schedule+0x3f/0x60
Oct 28 16:29:28 hostname kernel: [243360.480495]  [<ffffffff81659d37>] 
__mutex_lock_slowpath+0xd7/0x150

That's all, nothing more, waited about 10 min's, nothing happened... In this 
situation I again can't login with SSH as mentioned above, no USB Device works 
any more, I have to hard reset the machine...

I use a SEDU-ambilight from here: www.sedu-board.de together with XBMC and the 
boblight-addon.

The top of the boblight.conf is this:
[global]
interface       127.0.0.1
port            19333

[device]
name            ambilight
output   /dev/ttyUSB0
channels        234
type            momo
interval        10000
prefix          5A B0
postfix         A5
rate            500000
debug           on

Original comment by borin1...@googlemail.com on 28 Oct 2012 at 3:48

GoogleCodeExporter commented 8 years ago
I'm having the same problem with gentoo (kernel 3.0.4) and with ubuntu 12.04.1 
(kernel 3.2.0-32).
Is there any news about it?

Original comment by satguber...@googlemail.com on 3 Nov 2012 at 12:38

GoogleCodeExporter commented 8 years ago
I suggest reporting this to the linux kernel developers, a process should not 
be able to hang the kernel.

Original comment by bob.loo...@gmail.com on 3 Nov 2012 at 7:04

GoogleCodeExporter commented 8 years ago
I filed a Kernel-Bug Report here:

https://bugzilla.kernel.org/show_bug.cgi?id=50121

let's see their reaction...

@satguber: What kind of hardware do you have? A SEDU Board, too?

Original comment by borin1...@googlemail.com on 7 Nov 2012 at 7:22

GoogleCodeExporter commented 8 years ago
Thank's for filling the report.
I have a sedu board too.
Let's hope the there will be a solution soon...

Original comment by satguber...@googlemail.com on 9 Nov 2012 at 3:14

GoogleCodeExporter commented 8 years ago
OK, I think I figured out what is responsible:

When I had this behaviour the SEDU-Box was connected to an USB1.1 Port ... 
making sure that it is connected to an USB2.0 Port seems to solve the 
problem....

Original comment by borin1...@googlemail.com on 15 Nov 2012 at 8:56

GoogleCodeExporter commented 8 years ago
The connection to an USB 2.0 port works for you?
On my systems the problem is still there...

Original comment by satguber...@googlemail.com on 21 Nov 2012 at 5:22

GoogleCodeExporter commented 8 years ago
yes, USB2.0 works fine for me

Original comment by borin1...@googlemail.com on 21 Nov 2012 at 5:27