raspberrypi / linux

Kernel source tree for Raspberry Pi-provided kernel builds. Issues unrelated to the linux kernel should be posted on the community forum at https://forums.raspberrypi.com/
Other
11.03k stars 4.95k forks source link

HDMI control can hang with recent firmware #1364

Closed tghewett closed 7 years ago

tghewett commented 8 years ago

My app used to be able to poll for the presence of a device on the HDMI bus but with more recent firmware it is no longer reliable. Quitting the app and using a related tool to view the HDMI device EDID capabilities results in it hanging in much the same way. Attempts to use the Broadcom tvservice utility, e.g. tvservice -s also sees that utility hang.

The problem is only cured with a reboot.

This may or may not be related to some apparent kernel panics / crashes / oopss, I note the presence of the symbol rpi_firmware_property_list:

[ 6600.211293] INFO: task kworker/0:1:31 blocked for more than 120 seconds.
[ 6600.211307]       Not tainted 4.1.19-v7 #1
[ 6600.211312] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 6600.211318] kworker/0:1     D 8056f940     0    31      2 0x00000000
[ 6600.211339] Workqueue: events od_dbs_timer
[ 6600.211359] [<8056f940>] (__schedule) from [<8056ff70>] (schedule+0x40/0xa4)
[ 6600.211370] [<8056ff70>] (schedule) from [<80572a70>] (schedule_timeout+0x15c/0x250)
[ 6600.211380] [<80572a70>] (schedule_timeout) from [<80570a24>] (wait_for_common+0xdc/0x198)
[ 6600.211390] [<80570a24>] (wait_for_common) from [<80570b00>] (wait_for_completion+0x20/0x24)
[ 6600.211401] [<80570b00>] (wait_for_completion) from [<80453824>] (rpi_firmware_property_list+0x110/0x1e8)
[ 6600.211412] [<80453824>] (rpi_firmware_property_list) from [<80453960>] (rpi_firmware_property+0x64/0x84)
[ 6600.211423] [<80453960>] (rpi_firmware_property) from [<80433840>] (bcm2835_cpufreq_clock_property.constprop.1+0x48/0x5c)
[ 6600.211433] [<80433840>] (bcm2835_cpufreq_clock_property.constprop.1) from [<80433898>] (bcm2835_cpufreq_driver_target_index+0x44/0xc8)
[ 6600.211444] [<80433898>] (bcm2835_cpufreq_driver_target_index) from [<8042e1fc>] (__cpufreq_driver_target+0x168/0x29c)
[ 6600.211455] [<8042e1fc>] (__cpufreq_driver_target) from [<80431b70>] (dbs_freq_increase+0x54/0x8c)
[ 6600.211466] [<80431b70>] (dbs_freq_increase) from [<80431c0c>] (od_check_cpu+0x64/0xcc)
[ 6600.211476] [<80431c0c>] (od_check_cpu) from [<80432fb4>] (dbs_check_cpu+0x1a0/0x1e8)
[ 6600.211485] [<80432fb4>] (dbs_check_cpu) from [<804320ec>] (od_dbs_timer+0xe8/0x140)
[ 6600.211498] [<804320ec>] (od_dbs_timer) from [<8003cd84>] (process_one_work+0x15c/0x480)
[ 6600.211510] [<8003cd84>] (process_one_work) from [<8003d230>] (worker_thread+0x144/0x4d8)
[ 6600.211521] [<8003d230>] (worker_thread) from [<800428fc>] (kthread+0xe8/0x104)
[ 6600.211533] [<800428fc>] (kthread) from [<8000f938>] (ret_from_fork+0x14/0x3c)
[ 6600.211544] INFO: task kworker/1:1:42 blocked for more than 120 seconds.
[ 6600.211549]       Not tainted 4.1.19-v7 #1
[ 6600.211553] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 6600.211557] kworker/1:1     D 8056f940     0    42      2 0x00000000
[ 6600.211569] Workqueue: events od_dbs_timer
[ 6600.211579] [<8056f940>] (__schedule) from [<8056ff70>] (schedule+0x40/0xa4)
[ 6600.211588] [<8056ff70>] (schedule) from [<80570268>] (schedule_preempt_disabled+0x18/0x1c)
[ 6600.211597] [<80570268>] (schedule_preempt_disabled) from [<80571984>] (__mutex_lock_slowpath+0xac/0x164)
[ 6600.211607] [<80571984>] (__mutex_lock_slowpath) from [<80571aa0>] (mutex_lock+0x64/0x68)
[ 6600.211617] [<80571aa0>] (mutex_lock) from [<8043204c>] (od_dbs_timer+0x48/0x140)
[ 6600.211628] [<8043204c>] (od_dbs_timer) from [<8003cd84>] (process_one_work+0x15c/0x480)
[ 6600.211638] [<8003cd84>] (process_one_work) from [<8003d230>] (worker_thread+0x144/0x4d8)
[ 6600.211648] [<8003d230>] (worker_thread) from [<800428fc>] (kthread+0xe8/0x104)
[ 6600.211657] [<800428fc>] (kthread) from [<8000f938>] (ret_from_fork+0x14/0x3c)
[ 6600.211673] INFO: task kworker/3:1:58 blocked for more than 120 seconds.
[ 6600.211678]       Not tainted 4.1.19-v7 #1
[ 6600.211682] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 6600.211686] kworker/3:1     D 8056f940     0    58      2 0x00000000
[ 6600.211697] Workqueue: events od_dbs_timer
[ 6600.211706] [<8056f940>] (__schedule) from [<8056ff70>] (schedule+0x40/0xa4)
[ 6600.211715] [<8056ff70>] (schedule) from [<80570268>] (schedule_preempt_disabled+0x18/0x1c)
[ 6600.211725] [<80570268>] (schedule_preempt_disabled) from [<80571984>] (__mutex_lock_slowpath+0xac/0x164)
[ 6600.211735] [<80571984>] (__mutex_lock_slowpath) from [<80571aa0>] (mutex_lock+0x64/0x68)
[ 6600.211744] [<80571aa0>] (mutex_lock) from [<8043204c>] (od_dbs_timer+0x48/0x140)
[ 6600.211755] [<8043204c>] (od_dbs_timer) from [<8003cd84>] (process_one_work+0x15c/0x480)
[ 6600.211765] [<8003cd84>] (process_one_work) from [<8003d230>] (worker_thread+0x144/0x4d8)
[ 6600.211776] [<8003d230>] (worker_thread) from [<800428fc>] (kthread+0xe8/0x104)
[ 6600.211785] [<800428fc>] (kthread) from [<8000f938>] (ret_from_fork+0x14/0x3c)
[ 6600.211799] INFO: task kworker/2:2:506 blocked for more than 120 seconds.
[ 6600.211804]       Not tainted 4.1.19-v7 #1
[ 6600.211808] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 6600.211812] kworker/2:2     D 8056f940     0   506      2 0x00000000
[ 6600.211823] Workqueue: events od_dbs_timer
[ 6600.211833] [<8056f940>] (__schedule) from [<8056ff70>] (schedule+0x40/0xa4)
[ 6600.211842] [<8056ff70>] (schedule) from [<80570268>] (schedule_preempt_disabled+0x18/0x1c)
[ 6600.211852] [<80570268>] (schedule_preempt_disabled) from [<80571984>] (__mutex_lock_slowpath+0xac/0x164)
[ 6600.211861] [<80571984>] (__mutex_lock_slowpath) from [<80571aa0>] (mutex_lock+0x64/0x68)
[ 6600.211871] [<80571aa0>] (mutex_lock) from [<8043204c>] (od_dbs_timer+0x48/0x140)
[ 6600.211882] [<8043204c>] (od_dbs_timer) from [<8003cd84>] (process_one_work+0x15c/0x480)
[ 6600.211893] [<8003cd84>] (process_one_work) from [<8003d230>] (worker_thread+0x144/0x4d8)
[ 6600.211902] [<8003d230>] (worker_thread) from [<800428fc>] (kthread+0xe8/0x104)
[ 6600.211911] [<800428fc>] (kthread) from [<8000f938>] (ret_from_fork+0x14/0x3c)
tghewett commented 8 years ago

By old firmware I mean the libraries in /opt are old, but /boot is newer - March 2016. With that arrangement it works.

popcornmix commented 8 years ago

The issue will almost certainly reside in the gpu firmware (i.e. /boot/start.elf and /boot/fixup.dat).

tghewett commented 8 years ago

The issue will almost certainly reside in the gpu firmware (i.e. /boot/start.elf and /boot/fixup.dat).

Perhaps there is a difference in the /opt/vc/lib software which provokes a GPU problem which was present beforehand, not provoked by older releases, with the core problem being in the GPU as you say. I find regressing the libs helps, the start.elf and mixup.elf seem to make little difference, similarly the kernel is fine.

I have just done a md5sum on the fixup.dat and start.elf (see below) and they are the ones taken from a recent Raspian image from raspberrypi.org post RP3 release.

[29/06/2016 23:40 raspberrypi:~] pi$ md5sum /boot/fixup.dat 92e44726dcf7d708b5cf36d3badf09ad /boot/fixup.dat [29/06/2016 23:40 raspberrypi:~] pi$ md5sum /boot/start.elf 0fcd3fb4e731c6d164e4b4c66fdaa2a7 /boot/start.elf [29/06/2016 23:40 raspberrypi:~] pi$

[EDIT: excuse my computer's automatic spelling correction, mixup.elf -> fixup.dat]

faxik commented 7 years ago

I'm experiencing the same issue.

Maybe it's related, maybe not, but the systems (with identical image) that experience the issue also sometimes have problems booting up with a display attached.

Does someone came with the solution to that?

Ruler2112 commented 7 years ago

It seems to be related to the display. Multiple Pi units with the exact same configuration (done by script) work on some TVs and only intermittently on others. I move a Pi that has never had an issue to a TV that has, it'll have trouble and the one I moved to the TV without issues is instantly cured.

While not able to find the problem or fix it, I was able to find a workaround for it. In /etc/rc.local, I have display detection & set the resolution based off of the EDID reported. If I bypass this and set the resolution to a hard-coded value, the problem does not present itself:

if [ -e /home/pi/resx ]; then
  _XRES=`cat /home/pi/resx`;
  _YRES=`cat /home/pi/resy`;
  _DEPTH=32;
  printf "===> Resetting frame-buffer to %dx%dx%d...\n" $_XRES $_YRES $_DEPTH
  fbset --all --geometry $_XRES $_YRES $_XRES $_YRES $_DEPTH -left 0 -right 0 -upper 0 -lower 0;
else
#Lots of EDID reading & resolution setting code
fi

With this code in /etc/rc.local, I can simply create 2 files - resx and resy in /home/pi - with only the number of the resolution in the respective files. (I know I could have done it in a different/more clean manner, but I wanted the problem to disappear quickly, no matter if the solution is right or not.) This seems to work, as the troublesome displays haven't had trouble since I implemented it.

Ruler2112 commented 7 years ago

There are back-ticks around the two cat commands for those who are script-impaired. :)

pelwell commented 7 years ago

There are now triple back-ticks around the whole script for those who are Markdown-impaired. ;-)

Ruler2112 commented 7 years ago

LOL... that does look better. If I cared, I'd look up what Markdown means in present context. ;)

popcornmix commented 7 years ago

@faxik does: while : ; do tvservice -d /tmp/edid.dat; done run forever or hang after a short while? Does the problem also occur with a different display?

JamesH65 commented 7 years ago

Closing due to lack of activity. Reopen if you feel this issue is still relevant.