Open bnewbold opened 9 years ago
Just my 2 cents: I'm getting similar problem when trying to run OpenCL applications on Novena (PVT2-A). It's in headless mode, so I don't know how X behaves, but can test if necessary.
After installing Vivante SDK, even simply querying for OpenCL devices (without actually running anything) often hangs the board with similar error messages.
To reproduce: extract gpu-viv-bin-mx6q-3.10.17-1.0.0-hfp
to $HOME
, then
export LD_LIBRARY_PATH="$HOME/gpu-viv-bin-mx6q-3.10.17-1.0.0-hfp/usr/lib/"
sudo chmod go+rw /dev/galcore
wget 'http://graphics.stanford.edu/~yoel/notes/clInfo.c'
gcc -lOpenCL -lGAL clInfo.c -I$HOME/gpu-viv-bin-mx6q-3.10.17-1.0.0-hfp/usr/include/ -L$HOME/gpu-viv-bin-mx6q-3.10.17-1.0.0-hfp/usr/lib/ -o clInfo
./clInfo
About 75% of runs are successful, but about 25% end up hanging the board with following messages:
[ 301.047717] fec 2188000.ethernet eth0: MDIO read timeout
[ 301.077736] fec 2188000.ethernet eth0: MDIO write timeout
[ 302.107628] fec 2188000.ethernet eth0: MDIO read timeout
[ 320.116018] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [clInfo:994]
[ 320.123266] Modules linked in: binfmt_misc mma8452 industrialio_triggered_buffer kfifo_buf industrialio ipv6
[ 320.133238]
[ 320.134753] CPU: 2 PID: 994 Comm: clInfo Not tainted 3.17.0-rc5-00217-gfd79638 #276
[ 320.142427] task: dac8f380 ti: daf7e000 task.ti: daf7e000
[ 320.147857] PC is at smp_call_function_many+0x240/0x2b8
[ 320.153099] LR is at smp_call_function_many+0x218/0x2b8
[ 320.158342] pc : [<c00a9d54>] lr : [<c00a9d2c>] psr: 200f0013
[ 320.158342] sp : daf7fcb8 ip : 00000000 fp : daf7fcec
[ 320.169836] r10: 00000004 r9 : 00000000 r8 : c051d7e4
[ 320.175074] r7 : c0a90774 r6 : dd93b0c0 r5 : dd93b0c4 r4 : c0a8fdbc
[ 320.181615] r3 : 00000001 r2 : 00000000 r1 : dd92b9b0 r0 : 00000000
[ 320.188158] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
[ 320.195307] Control: 10c5387d Table: 2af8004a DAC: 00000015
[ 320.201071] CPU: 2 PID: 994 Comm: clInfo Not tainted 3.17.0-rc5-00217-gfd79638 #276
[ 320.208788] [<c001894c>] (unwind_backtrace) from [<c00130a8>] (show_stack+0x20/0x24)
[ 320.216568] [<c00130a8>] (show_stack) from [<c072492c>] (dump_stack+0xa0/0xd8)
[ 320.223820] [<c072492c>] (dump_stack) from [<c0010048>] (show_regs+0x30/0x34)
[ 320.230995] [<c0010048>] (show_regs) from [<c00cc6ec>] (watchdog_timer_fn+0x190/0x1ec)
[ 320.238953] [<c00cc6ec>] (watchdog_timer_fn) from [<c0094384>] (__run_hrtimer+0x78/0x294)
[ 320.247162] [<c0094384>] (__run_hrtimer) from [<c0095050>] (hrtimer_interrupt+0x138/0x2e4)
[ 320.255456] [<c0095050>] (hrtimer_interrupt) from [<c00176f0>] (twd_handler+0x40/0x50)
[ 320.263407] [<c00176f0>] (twd_handler) from [<c00860bc>] (handle_percpu_devid_irq+0x80/0x194)
[ 320.271958] [<c00860bc>] (handle_percpu_devid_irq) from [<c0081e8c>] (generic_handle_irq+0x3c/0x4c)
[ 320.281031] [<c0081e8c>] (generic_handle_irq) from [<c000f91c>] (handle_IRQ+0x50/0xa0)
[ 320.288973] [<c000f91c>] (handle_IRQ) from [<c000862c>] (gic_handle_irq+0x94/0x130)
[ 320.296641] unwind: Unknown symbol address c000862c
[ 320.301529] unwind: Index not found c000862c
[ 321.015934] INFO: rcu_preempt detected stalls on CPUs/tasks: { 0} (detected by 1, t=2102 jiffies, g=1175, c=1174, q=502)
[ 321.026905] Task dump for CPU 0:
[ 321.030147] swapper/0 R running 0 0 0 0x00000002
[ 324.141852] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [kworker/3:2:175]
[ 324.149530] Modules linked in: binfmt_misc mma8452 industrialio_triggered_buffer kfifo_buf industrialio ipv6
[ 324.159500]
[ 324.161015] CPU: 3 PID: 175 Comm: kworker/3:2 Tainted: G L 3.17.0-rc5-00217-gfd79638 #276
[ 324.170274] Workqueue: events od_dbs_timer
[ 324.174403] task: dc49a100 ti: dc3a8000 task.ti: dc3a8000
[ 324.179824] PC is at smp_call_function_many+0x240/0x2b8
[ 324.185066] LR is at smp_call_function_many+0x218/0x2b8
[ 324.190311] pc : [<c00a9d54>] lr : [<c00a9d2c>] psr: 20000013
[ 324.190311] sp : dc3a9be0 ip : 00000000 fp : dc3a9c14
[ 324.201803] r10: 00000004 r9 : dc3a9cec r8 : c00173a0
[ 324.207042] r7 : c0a90774 r6 : dd9440c0 r5 : dd9440c4 r4 : c0a8fdbc
[ 324.213581] r3 : 00000001 r2 : 00000000 r1 : dd92b9c0 r0 : 00000000
[ 324.220124] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel
[ 324.227447] Control: 10c5387d Table: 2b99c04a DAC: 00000015
[ 324.233212] CPU: 3 PID: 175 Comm: kworker/3:2 Tainted: G L 3.17.0-rc5-00217-gfd79638 #276
[ 324.242451] Workqueue: events od_dbs_timer
[ 324.246614] [<c001894c>] (unwind_backtrace) from [<c00130a8>] (show_stack+0x20/0x24)
[ 324.254391] [<c00130a8>] (show_stack) from [<c072492c>] (dump_stack+0xa0/0xd8)
[ 324.261643] [<c072492c>] (dump_stack) from [<c0010048>] (show_regs+0x30/0x34)
[ 324.268812] [<c0010048>] (show_regs) from [<c00cc6ec>] (watchdog_timer_fn+0x190/0x1ec)
[ 324.276763] [<c00cc6ec>] (watchdog_timer_fn) from [<c0094384>] (__run_hrtimer+0x78/0x294)
[ 324.284971] [<c0094384>] (__run_hrtimer) from [<c0095050>] (hrtimer_interrupt+0x138/0x2e4)
[ 324.293265] [<c0095050>] (hrtimer_interrupt) from [<c00176f0>] (twd_handler+0x40/0x50)
[ 324.301214] [<c00176f0>] (twd_handler) from [<c00860bc>] (handle_percpu_devid_irq+0x80/0x194)
[ 324.309764] [<c00860bc>] (handle_percpu_devid_irq) from [<c0081e8c>] (generic_handle_irq+0x3c/0x4c)
[ 324.318835] [<c0081e8c>] (generic_handle_irq) from [<c000f91c>] (handle_IRQ+0x50/0xa0)
[ 324.326776] [<c000f91c>] (handle_IRQ) from [<c000862c>] (gic_handle_irq+0x94/0x130)
[ 324.334443] unwind: Unknown symbol address c000862c
[ 324.339331] unwind: Index not found c000862c
[ 296.903913] systemd[1]: Starting Journal Service...
[ 318.985309] INFO: rcu_preempt detected stalls on CPUs/tasks: { 0} (detected by 3, t=8407 jiffies, g=1175, c=1174, q=771)
[ 318.996281] Task dump for CPU 0:
[ 318.999524] swapper/0 R running 0 0 0 0x00000002
The messages about softlock and subsequent stacktraces sometimes appear ad infinitum, sometimes only once or twice. The pointers and register values in the clInfo
message are pretty consistent between retries, except for fp
and Table
.
Tried compiling latest kernel from this repo, but because it uses open GPU driver, any OpenCL request leads to segfault due to absent /dev/galcore
Sean's still on honeymoon, but just to chime in -- he mentioned that the Vivante drivers cause a hang on shutdown, so this is probably what you're seeing. You may want to disable the acceleration for now, and also file a bug in redmine so we can get jon nettleton on it.
thanks,
-b.
On 01/12/2015 08:31 PM, Andrey Alekseenko wrote:
Just my 2 cents: I'm getting similar problem when trying to run OpenCL applications on Novena (PVT2-A). It's in headless mode, so I don't know how X behaves, but can test if necessary.
After installing Vivante SDK, even simply querying for OpenCL devices (without actually running anything) often hangs the board with similar error messages.
To reproduce: extract |gpu-viv-bin-mx6q-3.10.17-1.0.0-hfp| to |$HOME|, then
export LD_LIBRARY_PATH="$HOME/gpu-viv-bin-mx6q-3.10.17-1.0.0-hfp/usr/lib/" sudo chmod go+rw /dev/galcore wget 'http://graphics.stanford.edu/~yoel/notes/clInfo.c' gcc -lOpenCL -lGAL clInfo.c -I$HOME/gpu-viv-bin-mx6q-3.10.17-1.0.0-hfp/usr/include/ -L$HOME/gpu-viv-bin-mx6q-3.10.17-1.0.0-hfp/usr/lib/ -o clInfo ./clInfo About 75% of runs are successful, but about 25% end up hanging the board with following messages:
The messages about softlock and subsequent stacktraces sometimes appear /ad infinitum/, sometimes only once or twice. The pointers and register values in the |clInfo| message are pretty consistent between retries, except for |fp| and |Table|.
Tried compiling latest kernel from this repo, but because it uses open GPU driver, any OpenCL request leads to segfault due to absent |/dev/galcore|
— Reply to this email directly or view it on GitHub https://github.com/xobs/novena-linux/issues/3#issuecomment-69563291.
^'~*-,._.^
'~-,..^'~_-,._.^
'~-,._.^`'~-,..^`'~-,..^`'~-,._.^`'
This is definitely an issue for @linux4kix. As @bunnie mentioned, disabling hardware acceleration fixes this problem. For other reasons, we'll disable hardware acceleration until we can fix galcore.
2 years since the last activity for this problem and it's still not resolved? It should be re closed.
This probably isn't the correct place to report such a bug, but it's convenient. I can cross-post to the redmine bug tracker if that is better.
I get the following dump on shutdown repeatably. This is with a PVT2 bare board. It's possible that this not actually a problem and just noise? I seem to get ext4 recovery notices on boot though, so I assume that the shutdown is failing before disks have been synced properly.
@bunnie: you might know something?