raspberrypi / linux

Kernel source tree for Raspberry Pi-provided kernel builds. Issues unrelated to the linux kernel should be posted on the community forum at https://forums.raspberrypi.com/
Other
11.15k stars 5k forks source link

HDMI control can hang with recent firmware #1364

Closed tghewett closed 7 years ago

tghewett commented 8 years ago

My app used to be able to poll for the presence of a device on the HDMI bus but with more recent firmware it is no longer reliable. Quitting the app and using a related tool to view the HDMI device EDID capabilities results in it hanging in much the same way. Attempts to use the Broadcom tvservice utility, e.g. tvservice -s also sees that utility hang.

The problem is only cured with a reboot.

This may or may not be related to some apparent kernel panics / crashes / oopss, I note the presence of the symbol rpi_firmware_property_list:

[ 6600.211293] INFO: task kworker/0:1:31 blocked for more than 120 seconds.
[ 6600.211307]       Not tainted 4.1.19-v7 #1
[ 6600.211312] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 6600.211318] kworker/0:1     D 8056f940     0    31      2 0x00000000
[ 6600.211339] Workqueue: events od_dbs_timer
[ 6600.211359] [<8056f940>] (__schedule) from [<8056ff70>] (schedule+0x40/0xa4)
[ 6600.211370] [<8056ff70>] (schedule) from [<80572a70>] (schedule_timeout+0x15c/0x250)
[ 6600.211380] [<80572a70>] (schedule_timeout) from [<80570a24>] (wait_for_common+0xdc/0x198)
[ 6600.211390] [<80570a24>] (wait_for_common) from [<80570b00>] (wait_for_completion+0x20/0x24)
[ 6600.211401] [<80570b00>] (wait_for_completion) from [<80453824>] (rpi_firmware_property_list+0x110/0x1e8)
[ 6600.211412] [<80453824>] (rpi_firmware_property_list) from [<80453960>] (rpi_firmware_property+0x64/0x84)
[ 6600.211423] [<80453960>] (rpi_firmware_property) from [<80433840>] (bcm2835_cpufreq_clock_property.constprop.1+0x48/0x5c)
[ 6600.211433] [<80433840>] (bcm2835_cpufreq_clock_property.constprop.1) from [<80433898>] (bcm2835_cpufreq_driver_target_index+0x44/0xc8)
[ 6600.211444] [<80433898>] (bcm2835_cpufreq_driver_target_index) from [<8042e1fc>] (__cpufreq_driver_target+0x168/0x29c)
[ 6600.211455] [<8042e1fc>] (__cpufreq_driver_target) from [<80431b70>] (dbs_freq_increase+0x54/0x8c)
[ 6600.211466] [<80431b70>] (dbs_freq_increase) from [<80431c0c>] (od_check_cpu+0x64/0xcc)
[ 6600.211476] [<80431c0c>] (od_check_cpu) from [<80432fb4>] (dbs_check_cpu+0x1a0/0x1e8)
[ 6600.211485] [<80432fb4>] (dbs_check_cpu) from [<804320ec>] (od_dbs_timer+0xe8/0x140)
[ 6600.211498] [<804320ec>] (od_dbs_timer) from [<8003cd84>] (process_one_work+0x15c/0x480)
[ 6600.211510] [<8003cd84>] (process_one_work) from [<8003d230>] (worker_thread+0x144/0x4d8)
[ 6600.211521] [<8003d230>] (worker_thread) from [<800428fc>] (kthread+0xe8/0x104)
[ 6600.211533] [<800428fc>] (kthread) from [<8000f938>] (ret_from_fork+0x14/0x3c)
[ 6600.211544] INFO: task kworker/1:1:42 blocked for more than 120 seconds.
[ 6600.211549]       Not tainted 4.1.19-v7 #1
[ 6600.211553] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 6600.211557] kworker/1:1     D 8056f940     0    42      2 0x00000000
[ 6600.211569] Workqueue: events od_dbs_timer
[ 6600.211579] [<8056f940>] (__schedule) from [<8056ff70>] (schedule+0x40/0xa4)
[ 6600.211588] [<8056ff70>] (schedule) from [<80570268>] (schedule_preempt_disabled+0x18/0x1c)
[ 6600.211597] [<80570268>] (schedule_preempt_disabled) from [<80571984>] (__mutex_lock_slowpath+0xac/0x164)
[ 6600.211607] [<80571984>] (__mutex_lock_slowpath) from [<80571aa0>] (mutex_lock+0x64/0x68)
[ 6600.211617] [<80571aa0>] (mutex_lock) from [<8043204c>] (od_dbs_timer+0x48/0x140)
[ 6600.211628] [<8043204c>] (od_dbs_timer) from [<8003cd84>] (process_one_work+0x15c/0x480)
[ 6600.211638] [<8003cd84>] (process_one_work) from [<8003d230>] (worker_thread+0x144/0x4d8)
[ 6600.211648] [<8003d230>] (worker_thread) from [<800428fc>] (kthread+0xe8/0x104)
[ 6600.211657] [<800428fc>] (kthread) from [<8000f938>] (ret_from_fork+0x14/0x3c)
[ 6600.211673] INFO: task kworker/3:1:58 blocked for more than 120 seconds.
[ 6600.211678]       Not tainted 4.1.19-v7 #1
[ 6600.211682] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 6600.211686] kworker/3:1     D 8056f940     0    58      2 0x00000000
[ 6600.211697] Workqueue: events od_dbs_timer
[ 6600.211706] [<8056f940>] (__schedule) from [<8056ff70>] (schedule+0x40/0xa4)
[ 6600.211715] [<8056ff70>] (schedule) from [<80570268>] (schedule_preempt_disabled+0x18/0x1c)
[ 6600.211725] [<80570268>] (schedule_preempt_disabled) from [<80571984>] (__mutex_lock_slowpath+0xac/0x164)
[ 6600.211735] [<80571984>] (__mutex_lock_slowpath) from [<80571aa0>] (mutex_lock+0x64/0x68)
[ 6600.211744] [<80571aa0>] (mutex_lock) from [<8043204c>] (od_dbs_timer+0x48/0x140)
[ 6600.211755] [<8043204c>] (od_dbs_timer) from [<8003cd84>] (process_one_work+0x15c/0x480)
[ 6600.211765] [<8003cd84>] (process_one_work) from [<8003d230>] (worker_thread+0x144/0x4d8)
[ 6600.211776] [<8003d230>] (worker_thread) from [<800428fc>] (kthread+0xe8/0x104)
[ 6600.211785] [<800428fc>] (kthread) from [<8000f938>] (ret_from_fork+0x14/0x3c)
[ 6600.211799] INFO: task kworker/2:2:506 blocked for more than 120 seconds.
[ 6600.211804]       Not tainted 4.1.19-v7 #1
[ 6600.211808] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 6600.211812] kworker/2:2     D 8056f940     0   506      2 0x00000000
[ 6600.211823] Workqueue: events od_dbs_timer
[ 6600.211833] [<8056f940>] (__schedule) from [<8056ff70>] (schedule+0x40/0xa4)
[ 6600.211842] [<8056ff70>] (schedule) from [<80570268>] (schedule_preempt_disabled+0x18/0x1c)
[ 6600.211852] [<80570268>] (schedule_preempt_disabled) from [<80571984>] (__mutex_lock_slowpath+0xac/0x164)
[ 6600.211861] [<80571984>] (__mutex_lock_slowpath) from [<80571aa0>] (mutex_lock+0x64/0x68)
[ 6600.211871] [<80571aa0>] (mutex_lock) from [<8043204c>] (od_dbs_timer+0x48/0x140)
[ 6600.211882] [<8043204c>] (od_dbs_timer) from [<8003cd84>] (process_one_work+0x15c/0x480)
[ 6600.211893] [<8003cd84>] (process_one_work) from [<8003d230>] (worker_thread+0x144/0x4d8)
[ 6600.211902] [<8003d230>] (worker_thread) from [<800428fc>] (kthread+0xe8/0x104)
[ 6600.211911] [<800428fc>] (kthread) from [<8000f938>] (ret_from_fork+0x14/0x3c)
popcornmix commented 8 years ago

Most likely the gpu has been killed before this, and you are just seeing the outcome of the gpu not responding. Can you provide an executable and/or script that provokes this issue? Something that can be run on a clean raspbian install and results in this failure would be ideal.

tghewett commented 8 years ago

The sequence of events which causes this is unknown at present, it just happens sometimes but is likely to be associated with the HDMI interface gaining and losing its connection with the amp, say because it is connected through an HDMI switch which is switched to a different source then back again.

popcornmix commented 8 years ago

I suspect it is unrelated to HDMI. I imagine at this point if you had ran vcgencmd version or any other command that talks to GPU you would have found that had failed instead.

What model of Pi do you have? Are you overclocking? Are you sure your power supply is good? Are you running any gpu code (opengles/video decode/camera/etc)?

tghewett commented 8 years ago

The problem has been witnessed on an original model B, a Pi 2, Pi 3, and model A, [EDIT] all with different PSUs [EDIT END]. No overclocking on any of them. They run a combination of Raspbian and Arch Linux, Raspbian running kernel 4.1.19 using a config derived from the standard release but with changes in the DVB configs, Arch Linux running 4.1.18+. No GPU code being run to my knowledge, assuming BCM audio is a separate thing. The GPU firmware is unmodified from that recently provided by rpi-update.

I will try vcgencmd version next time it happens, usually at least once a day.

popcornmix commented 8 years ago

What is the change in kernel DVB (dtb?) config? What app are you using for audio? Can you describe any peripherals connected to pi?

tghewett commented 8 years ago

DVB - drivers from linuxtv.org, i.e. unrelated to BCM, GPU etc. No peripherals needed to cause the problem, a bare board is enough. The app polls for the presence of an HDMI interface using vc_tv_hdmi_ddc_read(), this is where it gets stuck.

tghewett commented 8 years ago

More oddly the problem has appeared but without the hanging, yet the return from vc_tv_hdmi_ddc_read() is indicating no interface present. The HDMI port has a device connected and while an image is on the screen, the audio interface is not being detected, see the output of tvservice -a. Reading the EDID information seems to work but the output file is bad. This behaviour was cleared with a reboot.

[root@archlinux ~]# /opt/vc/bin/vcgencmd version Feb 29 2016 14:23:50 Copyright (c) 2012 Broadcom version dea971b793dd6cf89133ede5a8362eb77e4f4ade (clean) (release) [root@archlinux ~]# /opt/vc/bin/tvservice -s state 0x12000a [HDMI DMT (4) RGB full 4:3], 640x480 @ 60.00Hz, progressive [root@archlinux ~]# /opt/vc/bin/tvservice -a [root@archlinux ~]# /opt/vc/bin/tvservice -d 1.out Written 128 bytes to 1.out [root@archlinux ~]# [root@archlinux ~]# /opt/vc/bin/edidparser 1.out Enabling fuzzy format match... Parsing 1.out... HDMI: bad EDID header 20 0 0 0 0 0 0 0 edid_parser exited with code -1 [root@archlinux ~]#

After rebooting:

[root@archlinux ~]# /opt/vc/bin/tvservice -a PCM supported: Max channels: 8, Max samplerate: 192kHz, Max samplesize 24 bits. AC3 supported: Max channels: 8, Max samplerate: 48kHz, Max rate 640 kb/s. DTS supported: Max channels: 8, Max samplerate: 48kHz, Max rate 1536 kb/s. DTS_HD supported: Max channels: 8, Max samplerate: 192kHz, Max rate 8 kb/s. [root@archlinux ~]# [root@archlinux ~]# /opt/vc/bin/tvservice -d 1.out Written 256 bytes to 1.out [root@archlinux ~]# /opt/vc/bin/edidparser 1.out Enabling fuzzy format match... Parsing 1.out... HDMI:EDID version 1.3, 1 extensions, screen size 160x90 cm HDMI:EDID features - videodef 0x80 !standby !suspend !active off; colour encoding:RGB444|YCbCr422; sRGB is not default colourspace; preferred format is native; does not support GTF HDMI:EDID found monitor name descriptor tag 0xfc HDMI:EDID monitor name is AG-D200 HDMI:EDID found monitor range descriptor tag 0xfd HDMI:EDID monitor range offsets: V min=0, V max=0, H min=0, H max=0 HDMI:EDID monitor range: vertical is 48-62 Hz, horizontal is 14-68 kHz, max pixel clock is 150 MHz HDMI:EDID monitor range does not support GTF HDMI:EDID found preferred CEA detail timing format: 1920x1080p @ 60 Hz (16) HDMI:EDID found DMT detail timing format: 1360x768p @ 60 Hz (39) HDMI:EDID established timing I/II bytes are 21 08 00 HDMI:EDID found DMT format: code 4, 640x480p @ 60 Hz in established timing I/II HDMI:EDID found DMT format: code 9, 800x600p @ 60 Hz in established timing I/II HDMI:EDID found DMT format: code 16, 1024x768p @ 60 Hz in established timing I/II HDMI:EDID standard timings block x 8: 0x0101 0101 0101 0101 0101 0101 0101 0101 HDMI:EDID parsing v3 CEA extension 0 HDMI:EDID monitor support - underscan IT formats:no, basic audio:yes, yuv444:yes, yuv422:yes, #native DTD:0 HDMI:EDID found CEA detail timing format: 1920x1080i @ 50 Hz (20) HDMI:EDID found CEA detail timing format: 1920x1080i @ 60 Hz (5) HDMI:EDID found CEA detail timing format: 1280x720p @ 50 Hz (19) HDMI:EDID found CEA format: code 31, 1920x1080p @ 50Hz HDMI:EDID found CEA format: code 20, 1920x1080i @ 50Hz HDMI:EDID found CEA format: code 19, 1280x720p @ 50Hz HDMI:EDID found CEA format: code 18, 720x576p @ 50Hz HDMI:EDID found CEA format: code 17, 720x576p @ 50Hz HDMI:EDID found CEA format: code 22, 1440x576i @ 50Hz HDMI:EDID found CEA format: code 21, 1440x576i @ 50Hz HDMI:EDID found CEA format: code 16, 1920x1080p @ 60Hz HDMI:EDID found CEA format: code 5, 1920x1080i @ 60Hz HDMI:EDID found CEA format: code 4, 1280x720p @ 60Hz HDMI:EDID found CEA format: code 3, 720x480p @ 60Hz HDMI:EDID found CEA format: code 2, 720x480p @ 60Hz HDMI:EDID found CEA format: code 7, 1440x480i @ 60Hz HDMI:EDID found CEA format: code 6, 1440x480i @ 60Hz HDMI:EDID found CEA format: code 1, 640x480p @ 60Hz HDMI:EDID found CEA format: code 32, 1920x1080p @ 24Hz HDMI:EDID found CEA format: code 30, 1440x576p @ 50Hz HDMI:EDID found CEA format: code 29, 1440x576p @ 50Hz HDMI:EDID found CEA format: code 26, 2880x576i @ 50Hz HDMI:EDID found CEA format: code 25, 2880x576i @ 50Hz HDMI:EDID found CEA format: code 15, 1440x480p @ 60Hz HDMI:EDID found CEA format: code 14, 1440x480p @ 60Hz HDMI:EDID found CEA format: code 11, 2880x480i @ 60Hz HDMI:EDID found CEA format: code 10, 2880x480i @ 60Hz HDMI:EDID found CEA format: code 38, 2880x576p @ 50Hz HDMI:EDID found CEA format: code 37, 2880x576p @ 50Hz HDMI:EDID found CEA format: code 36, 2880x480p @ 60Hz HDMI:EDID found CEA format: code 35, 2880x480p @ 60Hz HDMI:EDID found audio format 2 channels PCM, sample rate: 32|44|48|88|96|176|192 kHz, sample size: 16|20|24 bits HDMI:EDID found audio format 8 channels PCM, sample rate: 32|44|48|88|96|176|192 kHz, sample size: 16|20|24 bits HDMI:EDID found audio format 8 channels AC3, sample rate: 32|44|48 kHz, bitrate: 640 kbps HDMI:EDID found audio format 8 channels DTS, sample rate: 44|48 kHz, bitrate: 1536 kbps HDMI:EDID found audio format 6 channels One Bit Audio, sample rate: 44 kHz, codec define: 0 HDMI:EDID found audio format 8 channels Dobly Digital+, sample rate: 44|48 kHz, codec define: 0 HDMI:EDID found audio format 8 channels DTS-HD, sample rate: 44|48|88|96|176|192 kHz, codec define: 1 HDMI:EDID found audio format 8 channels MLP, sample rate: 48|96|192 kHz, codec define: 0 HDMI:EDID found HDMI VSDB length 6 HDMI:EDID HDMI VSDB has physical address 1.2.0.0 HDMI:EDID HDMI VSDB supports AI:yes, dual link DVI:no HDMI:EDID HDMI VSDB deep colour support - 48-bit:no 36-bit:no 30-bit:no DC_yuv444:no HDMI:EDID HDMI VSDB has no latency information HDMI:EDID filtering formats with pixel clock > 162 MHz or h. blanking > 1023 HDMI:EDID best score mode initialised to CEA (1) 640x480p @ 60 Hz with pixel clock 25 MHz (score 0) HDMI:EDID best score mode is now CEA (1) 640x480p @ 60 Hz with pixel clock 25 MHz (score 61864) HDMI:EDID best score mode is now CEA (2) 720x480p @ 60 Hz with pixel clock 27 MHz (score 66472) HDMI:EDID CEA mode (3) 720x480p @ 60 Hz with pixel clock 27 MHz has a score of 66472 HDMI:EDID best score mode is now CEA (4) 1280x720p @ 60 Hz with pixel clock 74 MHz (score 135592) HDMI:EDID DMT mode (4) 640x480p @ 60 Hz with pixel clock 25 MHz has a score of 18432 HDMI:EDID best score mode is now CEA (5) 1920x1080i @ 60 Hz with pixel clock 74 MHz (score 3773832) HDMI:EDID CEA mode (6) 1440x480i @ 60 Hz with pixel clock 27 MHz has a score of 45736 HDMI:EDID CEA mode (7) 1440x480i @ 60 Hz with pixel clock 27 MHz has a score of 45736 HDMI:EDID DMT mode (9) 800x600p @ 60 Hz with pixel clock 40 MHz has a score of 28800 HDMI:EDID CEA mode (10) 2880x480i @ 60 Hz with pixel clock 54 MHz has a score of 45736 HDMI:EDID CEA mode (11) 2880x480i @ 60 Hz with pixel clock 54 MHz has a score of 45736 HDMI:EDID CEA mode (14) 1440x480p @ 60 Hz with pixel clock 54 MHz has a score of 66472 HDMI:EDID CEA mode (15) 1440x480p @ 60 Hz with pixel clock 54 MHz has a score of 66472 HDMI:EDID best score mode is now CEA (16) 1920x1080p @ 60 Hz with pixel clock 148 MHz (score 5398248) HDMI:EDID DMT mode (16) 1024x768p @ 60 Hz with pixel clock 65 MHz has a score of 47185 HDMI:EDID CEA mode (17) 720x576p @ 50 Hz with pixel clock 27 MHz has a score of 66472 HDMI:EDID CEA mode (18) 720x576p @ 50 Hz with pixel clock 27 MHz has a score of 66472 HDMI:EDID CEA mode (19) 1280x720p @ 50 Hz with pixel clock 74 MHz has a score of 3117160 HDMI:EDID CEA mode (20) 1920x1080i @ 50 Hz with pixel clock 74 MHz has a score of 4232360 HDMI:EDID CEA mode (21) 1440x576i @ 50 Hz with pixel clock 27 MHz has a score of 45736 HDMI:EDID CEA mode (22) 1440x576i @ 50 Hz with pixel clock 27 MHz has a score of 45736 HDMI:EDID CEA mode (25) 2880x576i @ 50 Hz with pixel clock 54 MHz has a score of 45736 HDMI:EDID CEA mode (26) 2880x576i @ 50 Hz with pixel clock 54 MHz has a score of 45736 HDMI:EDID CEA mode (29) 1440x576p @ 50 Hz with pixel clock 54 MHz has a score of 66472 HDMI:EDID CEA mode (30) 1440x576p @ 50 Hz with pixel clock 54 MHz has a score of 66472 HDMI:EDID CEA mode (31) 1920x1080p @ 50 Hz with pixel clock 148 MHz has a score of 232360 HDMI:EDID CEA mode (32) 1920x1080p @ 24 Hz with pixel clock 74 MHz has a score of 124532 HDMI:EDID CEA mode (35) 2880x480p @ 60 Hz with pixel clock 108 MHz has a score of 66472 HDMI:EDID CEA mode (36) 2880x480p @ 60 Hz with pixel clock 108 MHz has a score of 66472 HDMI:EDID CEA mode (37) 2880x576p @ 50 Hz with pixel clock 108 MHz has a score of 66472 HDMI:EDID CEA mode (38) 2880x576p @ 50 Hz with pixel clock 108 MHz has a score of 66472 HDMI:EDID DMT mode (39) 1360x768p @ 60 Hz with pixel clock 85 MHz has a score of 4562668 HDMI:EDID preferred mode remained as CEA (16) 1920x1080p @ 60 Hz with pixel clock 148 MHz HDMI:EDID has HDMI support and audio support edid_parser exited with code 0 [root@archlinux ~]#

popcornmix commented 8 years ago

HDMI: bad EDID header 20 0 0 0 0 0 0 0

This suggests we cannot read the edid reliable. This may be a physical issue. Can you try with a different HDMI cable? If that doesn't help try with a different HDMI display.

tghewett commented 8 years ago

Just to re-itereate, the problem does not happen with older firmware (including with the same hardware setup) so at present hardware issues have been excluded.

popcornmix commented 8 years ago

All older firmware are available here. You can downgrade with sudo rpi-update <hash> where hash comes from the end of the github commit URL. If you could identify when the problem started then we could identify the commit that caused the problem and presumably solve the issue.

tghewett commented 8 years ago

I imagine the older firmware is available as well for anyone else having this problem (if there is anyone of course).

Older firmware is good enough for me for now, this issue is a heads-up that things have deteriorated in this area. If there is anything else I can do to help you pinpoint the cause without having to plough through older firmware then certainly let me know, but I may not be able to get back very quickly.

Ruler2112 commented 8 years ago

I believe that this is the same problem I've been chasing for close to 2 months now. I am not good enough interpreting diagnostic hex codes to confirm this; here is what's in /var/log/messages:

Jun 24 08:00:12 TunnelS kernel: [ 10.047855] random: nonblocking pool is initialized Jun 24 08:00:15 TunnelS kernel: [ 13.570666] Bluetooth: Core ver 2.20 Jun 24 08:00:15 TunnelS kernel: [ 13.570797] NET: Registered protocol family 31 Jun 24 08:00:15 TunnelS kernel: [ 13.570814] Bluetooth: HCI device and connection manager initialized Jun 24 08:00:15 TunnelS kernel: [ 13.572001] Bluetooth: HCI socket layer initialized Jun 24 08:00:15 TunnelS kernel: [ 13.572038] Bluetooth: L2CAP socket layer initialized Jun 24 08:00:15 TunnelS kernel: [ 13.572103] Bluetooth: SCO socket layer initialized Jun 24 08:00:15 TunnelS kernel: [ 13.578789] Bluetooth: HCI UART driver ver 2.3 Jun 24 08:00:15 TunnelS kernel: [ 13.578879] Bluetooth: HCI UART protocol H4 registered Jun 24 08:00:15 TunnelS kernel: [ 13.578915] Bluetooth: HCI UART protocol Three-wire (H5) registered Jun 24 08:00:15 TunnelS kernel: [ 13.578937] Bluetooth: HCI UART protocol BCM registered Jun 24 08:00:16 TunnelS kernel: [ 13.736052] Bluetooth: BNEP (Ethernet Emulation) ver 1.3 Jun 24 08:00:16 TunnelS kernel: [ 13.736073] Bluetooth: BNEP filters: protocol multicast Jun 24 08:00:16 TunnelS kernel: [ 13.736094] Bluetooth: BNEP socket layer initialized Jun 24 08:04:24 TunnelS kernel: [ 240.140388] kworker/1:1 D 80569d90 0 42 2 0x00000000 Jun 24 08:04:24 TunnelS kernel: [ 240.140418] Workqueue: events od_dbs_timer Jun 24 08:04:24 TunnelS kernel: [ 240.140451] <80569d90> from <8056a264> Jun 24 08:04:24 TunnelS kernel: [ 240.140474] <8056a264> from <8056cf70> Jun 24 08:04:24 TunnelS kernel: [ 240.140499] <8056cf70> from <8056adfc> Jun 24 08:04:24 TunnelS kernel: [ 240.140520] <8056adfc> from <8056aee4> Jun 24 08:04:24 TunnelS kernel: [ 240.140542] <8056aee4> from <80450bf0> Jun 24 08:04:24 TunnelS kernel: [ 240.140564] <80450bf0> from <80450d58> Jun 24 08:04:24 TunnelS kernel: [ 240.140587] <80450d58> from <8043069c> Jun 24 08:04:24 TunnelS kernel: [ 240.140611] <8043069c> from <804306fc> Jun 24 08:04:24 TunnelS kernel: [ 240.140636] <804306fc> from <8042a934> Jun 24 08:04:24 TunnelS kernel: [ 240.140657] <8042a934> from <8042df94> Jun 24 08:04:24 TunnelS kernel: [ 240.140675] <8042df94> from <8042fcd8> Jun 24 08:04:24 TunnelS kernel: [ 240.140693] <8042fcd8> from <8042e1f4> Jun 24 08:04:24 TunnelS kernel: [ 240.140717] <8042e1f4> from <8003dbf8> Jun 24 08:04:24 TunnelS kernel: [ 240.140740] <8003dbf8> from <8003dec4> Jun 24 08:04:24 TunnelS kernel: [ 240.140762] <8003dec4> from <8004364c> Jun 24 08:04:24 TunnelS kernel: [ 240.140783] <8004364c> from <8000faf8> Jun 24 08:04:24 TunnelS kernel: [ 240.140828] kworker/2:1 D 80569d90 0 48 2 0x00000000 Jun 24 08:04:24 TunnelS kernel: [ 240.140849] Workqueue: events od_dbs_timer Jun 24 08:04:24 TunnelS kernel: [ 240.140871] <80569d90> from <8056a264> Jun 24 08:04:24 TunnelS kernel: [ 240.140891] <8056a264> from <8056a650> Jun 24 08:04:24 TunnelS kernel: [ 240.140912] <8056a650> from <8056bd80> Jun 24 08:04:24 TunnelS kernel: [ 240.140933] <8056bd80> from <8056be9c> Jun 24 08:04:24 TunnelS kernel: [ 240.140952] <8056be9c> from <8042e1c0> Jun 24 08:04:24 TunnelS kernel: [ 240.140973] <8042e1c0> from <8003dbf8> Jun 24 08:04:24 TunnelS kernel: [ 240.140994] <8003dbf8> from <8003dec4> Jun 24 08:04:24 TunnelS kernel: [ 240.141012] <8003dec4> from <8004364c> Jun 24 08:04:24 TunnelS kernel: [ 240.141031] <8004364c> from <8000faf8> Jun 24 08:04:24 TunnelS kernel: [ 240.141073] kworker/3:1 D 80569d90 0 56 2 0x00000000 Jun 24 08:04:24 TunnelS kernel: [ 240.141093] Workqueue: events od_dbs_timer Jun 24 08:04:24 TunnelS kernel: [ 240.141114] <80569d90> from <8056a264> Jun 24 08:04:24 TunnelS kernel: [ 240.141135] <8056a264> from <8056a650> Jun 24 08:04:24 TunnelS kernel: [ 240.141157] <8056a650> from <8056bd80> Jun 24 08:04:24 TunnelS kernel: [ 240.141177] <8056bd80> from <8056be9c> Jun 24 08:04:24 TunnelS kernel: [ 240.141196] <8056be9c> from <8042e1c0> Jun 24 08:04:24 TunnelS kernel: [ 240.141217] <8042e1c0> from <8003dbf8> Jun 24 08:04:24 TunnelS kernel: [ 240.141238] <8003dbf8> from <8003dec4> Jun 24 08:04:24 TunnelS kernel: [ 240.141257] <8003dec4> from <8004364c> Jun 24 08:04:24 TunnelS kernel: [ 240.141276] <8004364c> from <8000faf8> Jun 24 08:04:24 TunnelS kernel: [ 240.141316] kworker/0:2 D 80569d90 0 97 2 0x00000000 Jun 24 08:04:24 TunnelS kernel: [ 240.141337] Workqueue: events od_dbs_timer Jun 24 08:04:24 TunnelS kernel: [ 240.141358] <80569d90> from <8056a264> Jun 24 08:04:24 TunnelS kernel: [ 240.141378] <8056a264> from <8056a650> Jun 24 08:04:24 TunnelS kernel: [ 240.141399] <8056a650> from <8056bd80> Jun 24 08:04:24 TunnelS kernel: [ 240.141420] <8056bd80> from <8056be9c> Jun 24 08:04:24 TunnelS kernel: [ 240.141438] <8056be9c> from <8042e1c0> Jun 24 08:04:24 TunnelS kernel: [ 240.141458] <8042e1c0> from <8003dbf8> Jun 24 08:04:24 TunnelS kernel: [ 240.141481] <8003dbf8> from <8003dec4> Jun 24 08:04:24 TunnelS kernel: [ 240.141500] <8003dec4> from <8004364c> Jun 24 08:04:24 TunnelS kernel: [ 240.141518] <8004364c> from <8000faf8> Jun 24 08:06:24 TunnelS kernel: [ 360.141177] kworker/1:1 D 80569d90 0 42 2 0x00000000 Jun 24 08:06:24 TunnelS kernel: [ 360.141210] Workqueue: events od_dbs_timer Jun 24 08:06:24 TunnelS kernel: [ 360.141244] <80569d90> from <8056a264> Jun 24 08:06:24 TunnelS kernel: [ 360.141266] <8056a264> from <8056cf70> Jun 24 08:06:24 TunnelS kernel: [ 360.141289] <8056cf70> from <8056adfc> Jun 24 08:06:24 TunnelS kernel: [ 360.141310] <8056adfc> from <8056aee4> Jun 24 08:06:24 TunnelS kernel: [ 360.141333] <8056aee4> from <80450bf0> Jun 24 08:06:24 TunnelS kernel: [ 360.141356] <80450bf0> from <80450d58> Jun 24 08:06:24 TunnelS kernel: [ 360.141379] <80450d58> from <8043069c> Jun 24 08:06:24 TunnelS kernel: [ 360.141402] <8043069c> from <804306fc> Jun 24 08:06:24 TunnelS kernel: [ 360.141427] <804306fc> from <8042a934> Jun 24 08:06:24 TunnelS kernel: [ 360.141447] <8042a934> from <8042df94> Jun 24 08:06:24 TunnelS kernel: [ 360.141468] <8042df94> from <8042fcd8> Jun 24 08:06:24 TunnelS kernel: [ 360.141486] <8042fcd8> from <8042e1f4> Jun 24 08:06:24 TunnelS kernel: [ 360.141509] <8042e1f4> from <8003dbf8> Jun 24 08:06:24 TunnelS kernel: [ 360.141532] <8003dbf8> from <8003dec4> Jun 24 08:06:24 TunnelS kernel: [ 360.141552] <8003dec4> from <8004364c> Jun 24 08:06:24 TunnelS kernel: [ 360.141574] <8004364c> from <8000faf8> Jun 24 08:06:24 TunnelS kernel: [ 360.141619] kworker/2:1 D 80569d90 0 48 2 0x00000000 Jun 24 08:06:24 TunnelS kernel: [ 360.141639] Workqueue: events od_dbs_timer Jun 24 08:06:24 TunnelS kernel: [ 360.141661] <80569d90> from <8056a264> Jun 24 08:06:24 TunnelS kernel: [ 360.141682] <8056a264> from <8056a650> Jun 24 08:06:24 TunnelS kernel: [ 360.141704] <8056a650> from <8056bd80> Jun 24 08:06:24 TunnelS kernel: [ 360.141727] <8056bd80> from <8056be9c> Jun 24 08:06:24 TunnelS kernel: [ 360.141746] <8056be9c> from <8042e1c0> Jun 24 08:06:24 TunnelS kernel: [ 360.141765] <8042e1c0> from <8003dbf8> Jun 24 08:06:24 TunnelS kernel: [ 360.141787] <8003dbf8> from <8003dec4> Jun 24 08:06:24 TunnelS kernel: [ 360.141805] <8003dec4> from <8004364c> Jun 24 08:06:24 TunnelS kernel: [ 360.141824] <8004364c> from <8000faf8> Jun 24 08:06:24 TunnelS kernel: [ 360.141869] kworker/3:1 D 80569d90 0 56 2 0x00000000 Jun 24 08:06:24 TunnelS kernel: [ 360.141888] Workqueue: events od_dbs_timer Jun 24 08:06:24 TunnelS kernel: [ 360.141909] <80569d90> from <8056a264> Jun 24 08:06:24 TunnelS kernel: [ 360.141929] <8056a264> from <8056a650> Jun 24 08:06:24 TunnelS kernel: [ 360.141950] <8056a650> from <8056bd80> Jun 24 08:06:24 TunnelS kernel: [ 360.141972] <8056bd80> from <8056be9c> Jun 24 08:06:24 TunnelS kernel: [ 360.141991] <8056be9c> from <8042e1c0> Jun 24 08:06:24 TunnelS kernel: [ 360.142010] <8042e1c0> from <8003dbf8> Jun 24 08:06:24 TunnelS kernel: [ 360.142031] <8003dbf8> from <8003dec4> Jun 24 08:06:24 TunnelS kernel: [ 360.142050] <8003dec4> from <8004364c> Jun 24 08:06:24 TunnelS kernel: [ 360.142069] <8004364c> from <8000faf8> Jun 24 08:06:24 TunnelS kernel: [ 360.142111] kworker/0:2 D 80569d90 0 97 2 0x00000000 Jun 24 08:06:24 TunnelS kernel: [ 360.142130] Workqueue: events od_dbs_timer Jun 24 08:06:24 TunnelS kernel: [ 360.142151] <80569d90> from <8056a264> Jun 24 08:06:24 TunnelS kernel: [ 360.142172] <8056a264> from <8056a650> Jun 24 08:06:24 TunnelS kernel: [ 360.142193] <8056a650> from <8056bd80> Jun 24 08:06:24 TunnelS kernel: [ 360.142215] <8056bd80> from <8056be9c> Jun 24 08:06:24 TunnelS kernel: [ 360.142234] <8056be9c> from <8042e1c0> Jun 24 08:06:24 TunnelS kernel: [ 360.142254] <8042e1c0> from <8003dbf8> Jun 24 08:06:24 TunnelS kernel: [ 360.142275] <8003dbf8> from <8003dec4> Jun 24 08:06:24 TunnelS kernel: [ 360.142294] <8003dec4> from <8004364c> Jun 24 08:06:24 TunnelS kernel: [ 360.142312] <8004364c> from <8000faf8> Jun 24 08:08:24 TunnelS kernel: [ 480.141962] kworker/1:1 D 80569d90 0 42 2 0x00000000 Jun 24 08:08:24 TunnelS kernel: [ 480.141993] Workqueue: events od_dbs_timer Jun 24 08:08:24 TunnelS kernel: [ 480.142026] <80569d90> from <8056a264> Jun 24 08:08:24 TunnelS kernel: [ 480.142048] <8056a264> from <8056cf70> Jun 24 08:08:24 TunnelS kernel: [ 480.142070] <8056cf70> from <8056adfc> Jun 24 08:08:24 TunnelS kernel: [ 480.142094] <8056adfc> from <8056aee4> Jun 24 08:08:24 TunnelS kernel: [ 480.142117] <8056aee4> from <80450bf0> Jun 24 08:08:24 TunnelS kernel: [ 480.142140] <80450bf0> from <80450d58> Jun 24 08:08:24 TunnelS kernel: [ 480.142163] <80450d58> from <8043069c> Jun 24 08:08:24 TunnelS kernel: [ 480.142185] <8043069c> from <804306fc> Jun 24 08:08:24 TunnelS kernel: [ 480.142210] <804306fc> from <8042a934> Jun 24 08:08:24 TunnelS kernel: [ 480.142232] <8042a934> from <8042df94> Jun 24 08:08:24 TunnelS kernel: [ 480.142250] <8042df94> from <8042fcd8> Jun 24 08:08:24 TunnelS kernel: [ 480.142268] <8042fcd8> from <8042e1f4> Jun 24 08:08:24 TunnelS kernel: [ 480.142290] <8042e1f4> from <8003dbf8> Jun 24 08:08:24 TunnelS kernel: [ 480.142312] <8003dbf8> from <8003dec4> Jun 24 08:08:24 TunnelS kernel: [ 480.142334] <8003dec4> from <8004364c> Jun 24 08:08:24 TunnelS kernel: [ 480.142356] <8004364c> from <8000faf8> Jun 24 08:08:24 TunnelS kernel: [ 480.142400] kworker/2:1 D 80569d90 0 48 2 0x00000000 Jun 24 08:08:24 TunnelS kernel: [ 480.142421] Workqueue: events od_dbs_timer Jun 24 08:08:24 TunnelS kernel: [ 480.142442] <80569d90> from <8056a264> Jun 24 08:08:24 TunnelS kernel: [ 480.142464] <8056a264> from <8056a650> Jun 24 08:08:24 TunnelS kernel: [ 480.142485] <8056a650> from <8056bd80> Jun 24 08:08:24 TunnelS kernel: [ 480.142506] <8056bd80> from <8056be9c> Jun 24 08:08:24 TunnelS kernel: [ 480.142525] <8056be9c> from <8042e1c0> Jun 24 08:08:24 TunnelS kernel: [ 480.142544] <8042e1c0> from <8003dbf8> Jun 24 08:08:24 TunnelS kernel: [ 480.142565] <8003dbf8> from <8003dec4> Jun 24 08:08:24 TunnelS kernel: [ 480.142585] <8003dec4> from <8004364c> Jun 24 08:08:24 TunnelS kernel: [ 480.142604] <8004364c> from <8000faf8> Jun 24 08:08:24 TunnelS rsyslogd-2007: action 'action 17' suspended, next retry is Fri Jun 24 08:08:54 2016 [try http://www.rsyslog.com/e/2007 ] Jun 24 08:15:24 TunnelS rsyslogd-2007: action 'action 17' suspended, next retry is Fri Jun 24 08:15:54 2016 [try http://www.rsyslog.com/e/2007 ] Jun 24 08:17:01 TunnelS rsyslogd-2007: action 'action 17' suspended, next retry is Fri Jun 24 08:17:31 2016 [try http://www.rsyslog.com/e/2007 ]

It is with both a Pi 2 model B and a Pi 3 model B. Out of half a dozen Pi displays in use at this location, only one has the issue. If the Pi is moved to another TV or monitor, the problem does not occur. A different physical Pi also has the same issue on this display, so it appears to me that the issue is related to the display the Pi is connected to. The TV that produces the problem is a Coby brand, model LEDTV4026. This is the only TV of this brand at our location.

The TV in question displays an appointment board (displays hypertext on a local BSD server) and automatically reboots each night. (For some reason, the META tag to refresh the page stops working if you leave the page up for very long periods of time. This happens on multiple browsers and different operating systems - not just Pi/Raspian.) When the Pi starts booting, it hangs just after starting the Bluetooth service message displays on the screen. The Pi does not respond to ssh requests, so I'm unable to do anything when this happens other than reboot & look at log files. The odd thing is that this does not happen every time it reboots - maybe half the time, it will load normally. It does not seem to matter if the TV is on or off and I've been unable to discern any type of pattern to whether it boots or not.

I have tried reinstalling Raspian on a different SD card and the problem persisted. Based on information I found online, the following lines have been added to /boot/config.txt intended to bypass the automatic resolution detection of the display - these also have not had any impact:

disable_overscan=1 framebuffer_width=1900 framebuffer_height=1200 framebuffer_depth=32 framebuffer_ignore_alpha=1 hdmi_pixel_encoding=1 hdmi_group=2

Assuming this is the same issue, does anybody know what version of the firmware I should revert to in order to eliminate this problem?

popcornmix commented 8 years ago

If this was caused by a regression to firmware, you can identify the exact update which caused this. See: https://github.com/Hexxeh/rpi-firmware/commits/master

If you click on each commit the end of the url contains a git hash. Run sudo rpi-update <hash> to revert back to that version. Report the first version with the problem.

Ruler2112 commented 8 years ago

I cannot try dozens of different firmwares, trying to find one that does not have this flaw in it. (Especially since I'd have to move the Pi back & forth between the TV that has the problem and my office.) We're just going to replace the TV instead. :(

popcornmix commented 8 years ago

Well, without information on how to reproduce the issue or an indication of when the issue appeared it's not going to be fixed. There is a suggestion that an older firmware doesn't have the issue. Tell when the issue first started (with binary searcing you'll only have to test a handful of firmwares) and I suspect we can resolve it. If you don't it won't be fixed.

Ruler2112 commented 8 years ago

It appears that I was wrong. I left the Pi on my office LCD monitor over the weekend & it's locked up this morning, so it's not that specific TV with the problem as previously suspected. Another thing I noticed is that both the red & green lights are on solid during the lockup - don't know if this is pertinent or not though. With it happening on my LCD, I honestly don't have any sort of pattern or commonality to it anymore, as two different Pi 3s have this issue while others do not. All the Pis with the problem were from the same batch from Amazon.

How does one do a 'binary search' for firmwares? (I honestly do not know & assumed from the link earlier that I'd have to try each of the hundreds of revisions listed. Each would require running the Pi for a few days to make sure it does/does not have the problem as rebooting from the command line apparently doesn't produce the problem. My boss wants the appointment board fixed, not excuses from me as to why it's dragging on for months and my spending days/weeks troubleshooting. From the experience I have had coding, I understand that knowing what version it appeared in is extremely helpful, but you should try to understand the perspective of non-programmers and realize that you probably know more about firmwares than most anybody else. Personally, I didn't even know the Pis had firmware in them; thought everything was run off the SD card.) How do I check what version of firmware a given Pi is running? If there truly are only a handful of versions to try, I can put on the version that the functional Pi 3 is running & go from there...

Is there anything at all I can do with the Pi locked up to get more details of what's going on? Keyboard and mouse seem unresponsive & it does not respond to an ssh request. The Pi was set up the same as the rest of them in use, using the same version of Raspian and all software is identical versions to those in use.

popcornmix commented 8 years ago

If you have a list a firmwares, and you believe it was good, say, a year ago and is bad currently. And lets assume there have been 64 firmware releases in the last year.

Rather than testing every firmware, you know: firmware(0) = bad (most recent) firmware(63) = good (year ago)

So, you test firmware(31). Suppose that is bad Then you test firmware(47). Suppose that is good Then you test firmware(39). Suppose that is bad Then you test firmware(43). Suppose that is bad Then you test firmware(45). Suppose that is bad Then you test firmware(46). Suppose that is good

Each step you pick the mid-point between the newest bad and the oldest good. So you know that the problem appeared in firmware(45) after 6 tests. https://en.wikipedia.org/wiki/Binary_search_algorithm

Ruler2112 commented 8 years ago

How do I figure out what version a given Pi is running?

popcornmix commented 8 years ago

vcgencmd version and uname -a both report a build date. The firmware version will have a date very shortly after the later of those two dates.

Ruler2112 commented 8 years ago

Before I reboot the thing, is there any information I can gather when it's in a locked state?

popcornmix commented 8 years ago

If you enable firmware assert logging (Add start_debug=1 to config.txt and reboot), then you may get something interesting from sudo vcdbg log msg and sudo vcdbg log assert after the locked state. Post output to paste site (e.g. pastebin.org) and include a link here.

Ruler2112 commented 8 years ago

Something's hinkey... I ran both commands you suggested & they reported Feb 25, 2016. Logged into several of the other Pis I have in place & ran the same commands and they came back with the same exact date/time. Is this reporting something in the OS (which would make sense given that they were all installed from the same source) or the firmware version being run?

popcornmix commented 8 years ago

It's the firmware/kernel version being run.

Ruler2112 commented 8 years ago

So you're using the terms 'firmware' and 'kernel' interchangeably here?

popcornmix commented 8 years ago

Specifically firmware is really the start.elf file than runs on the GPU. kernel is the kernel.img file that runs on the ARM. However it is common to refer to using rpi-update to update the firmware, meaning everthing in the https://github.com/raspberrypi/firmware repo (which includes start.elf and kernel.img and /opt/vc userland libs).

Your issue is almost certainly contained within the gpu firmware (start.elf).

Ruler2112 commented 8 years ago

I added start_debug=1 to /boot/config.txt and have it in a frozen state. Tried rebooting & running the vcdbg commands you provided, but it came up with a whole ton of junk, nearly all about LED status. I'd guess this is because I have to reboot in order to do anything after it locks up. Is there any way to drop to a shell after the boot process hangs to perhaps gather more information?

popcornmix commented 8 years ago

Rebooting will clean any messages from vcdbg. In general a gpu hang doesn't stop you from interacting with the arm. You should still be able to ssh in and run commands. If you can't that suggests a different problem from a gpu hang.

What exactly happens? Can you ssh in after the hang?

Ruler2112 commented 8 years ago

As stated before, I cannot ssh to the Pi, though I am able to ping it.

Here's where it freezes when it locks up: http://tinypic.com/view.php?pic=11b1ykp&s=9

Ruler2112 commented 8 years ago

I had a thought & found something interesting. My /etc/rc.local file looks like this:

if [ -e /home/pi/resx ]; then
  echo Setting resolution to the static file-based one...
  _XRES=`cat /home/pi/resx`;
  _YRES=`cat /home/pi/resy`;
  _DEPTH=32;
  echo Screen resolution is $_XRES x $_YRES x $_DEPTH
  fbset --all --geometry $_XRES $_YRES $_XRES $_YRES $_DEPTH -left 0 -right 0 -upper 0 -lower 0;
else
  echo Detecting display...
  # Wait for the TV-screen to be turned on...
  while ! $( tvservice --dumpedid /tmp/edid | fgrep -qv 'Nothing written!' ); do
    bHadToWaitForScreen=true;
    echo Screen not connected - waiting...
    sleep 10;
  done;
  echo Screen found - extracting preferred resolution...
  _DEPTH=32;
  eval $( edidparser /tmp/edid | fgrep 'preferred mode' | tail -1 | sed -Ene 's/^.+(DMT|CEA) \(([0-9]+)\) ([0-9]+)x([0-9]+)[pi]? @.+/_GROUP=\1;_MODE=\2;_XRES=\3;_YRES=\4;/p' );
  echo Screen resolution is $_XRES x $_YRES x $_DEPTH
  tvservice --explicit="$_GROUP $_MODE"
  sleep 1;
  fbset --all --geometry $_XRES $_YRES $_XRES $_YRES $_DEPTH -left 0 -right 0 -upper 0 -lower 0;
  sleep 1;
fi
echo Disable screen blanking...
/usr/bin/setterm -blank 0
echo Starting display...
/home/pi/showdisplay
exit 0

I rebooted many, many times to get it to freeze up. (It's weird - it'll freeze 3-4 times in a row, then not again for the next 20 reboots.) The last line that shows up is Detecting display... To me, this means that it's locking up trying to dump the EDID information from the display; not failing, as it would spit out an error message, pause, and retry, which it is not doing. It also stops ssh from working, which is another oddity. Any ideas why simply trying to extract the EDID would lock up the machine like this? I've already added a 20 second sleep at the start of the file, thinking that two processes might be stepping on each other, but it had no impact.

The successfully extracted EDID for my monitor is:

Enabling fuzzy format match...
Parsing /tmp/edid...
HDMI:EDID version 1.3, 1 extensions, screen size 62x34 cm
HDMI:EDID features - videodef 0x80 !standby !suspend active off; colour encoding:RGB444|YCbCr422; sRGB is not default colourspace; preferred format is native; does not support GTF
HDMI:EDID found monitor S/N descriptor tag 0xff
HDMI:EDID found monitor range descriptor tag 0xfd
HDMI:EDID monitor range offsets: V min=0, V max=0, H min=0, H max=0
HDMI:EDID monitor range: vertical is 50-75 Hz, horizontal is 24-83 kHz, max pixel clock is 170 MHz
HDMI:EDID monitor range does not support GTF
HDMI:EDID found monitor name descriptor tag 0xfc
HDMI:EDID monitor name is ASUS_VN289
HDMI:EDID found preferred CEA detail timing format: 1920x1080p @ 60 Hz (16)
HDMI:EDID established timing I/II bytes are BF EF 00
HDMI:EDID found DMT format: code 4, 640x480p @ 60 Hz in established timing I/II
HDMI:EDID found DMT format: code 5, 640x480p @ 72 Hz in established timing I/II
HDMI:EDID found DMT format: code 6, 640x480p @ 75 Hz in established timing I/II
HDMI:EDID found DMT format: code 8, 800x600p @ 56 Hz in established timing I/II
HDMI:EDID found DMT format: code 9, 800x600p @ 60 Hz in established timing I/II
HDMI:EDID found DMT format: code 10, 800x600p @ 72 Hz in established timing I/II
HDMI:EDID found DMT format: code 11, 800x600p @ 75 Hz in established timing I/II
HDMI:EDID found DMT format: code 16, 1024x768p @ 60 Hz in established timing I/II
HDMI:EDID found DMT format: code 17, 1024x768p @ 70 Hz in established timing I/II
HDMI:EDID found DMT format: code 18, 1024x768p @ 75 Hz in established timing I/II
HDMI:EDID found DMT format: code 36, 1280x1024p @ 75 Hz in established timing I/II
HDMI:EDID standard timings block x 8: 0xD1C0 B300 9500 8180 8140 81C0 714F 0101 
HDMI:EDID found DMT format: code 82, 1920x1080p @ 60 Hz (16:9) in standard timing 0
HDMI:EDID found DMT format: code 58, 1680x1050p @ 60 Hz (16:10) in standard timing 1
HDMI:EDID found DMT format: code 47, 1440x900p @ 60 Hz (16:10) in standard timing 2
HDMI:EDID found DMT format: code 35, 1280x1024p @ 60 Hz (5:4) in standard timing 3
HDMI:EDID found DMT format: code 32, 1280x960p @ 60 Hz (4:3) in standard timing 4
HDMI:EDID found DMT format: code 85, 1280x720p @ 60 Hz (16:9) in standard timing 5
HDMI:EDID found DMT format: code 21, 1152x864p @ 75 Hz (4:3) in standard timing 6
HDMI:EDID parsing v3 CEA extension 0
HDMI:EDID monitor support - underscan IT formats:no, basic audio:yes, yuv444:yes, yuv422:yes, #native DTD:1
HDMI:EDID found CEA detail timing format: 720x480p @ 60 Hz (2)
HDMI:EDID found CEA detail timing format: 1280x720p @ 60 Hz (4)
HDMI:EDID found CEA detail timing format: 1280x720p @ 50 Hz (19)
HDMI:EDID found CEA detail timing format: 720x576p @ 50 Hz (17)
HDMI:EDID found CEA format: code 1, 640x480p @ 60Hz 
HDMI:EDID found CEA format: code 2, 720x480p @ 60Hz 
HDMI:EDID found CEA format: code 3, 720x480p @ 60Hz 
HDMI:EDID found CEA format: code 17, 720x576p @ 50Hz 
HDMI:EDID found CEA format: code 18, 720x576p @ 50Hz 
HDMI:EDID found CEA format: code 19, 1280x720p @ 50Hz 
HDMI:EDID found CEA format: code 4, 1280x720p @ 60Hz 
HDMI:EDID found CEA format: code 20, 1920x1080i @ 50Hz 
HDMI:EDID found CEA format: code 5, 1920x1080i @ 60Hz 
HDMI:EDID found CEA format: code 14, 1440x480p @ 60Hz 
HDMI:EDID found CEA format: code 15, 1440x480p @ 60Hz 
HDMI:EDID found CEA format: code 29, 1440x576p @ 50Hz 
HDMI:EDID found CEA format: code 30, 1440x576p @ 50Hz 
HDMI:EDID found CEA format: code 31, 1920x1080p @ 50Hz 
HDMI:EDID found CEA format: code 16, 1920x1080p @ 60Hz (native)
HDMI:EDID found CEA format: code 32, 1920x1080p @ 24Hz 
HDMI:EDID found CEA format: code 34, 1920x1080p @ 30Hz 
HDMI:EDID found audio format 2 channels PCM, sample rate: 32|44|48|96 kHz, sample size: 16|20|24 bits
HDMI:EDID found HDMI VSDB length 5
HDMI:EDID HDMI VSDB has physical address 1.0.0.0
HDMI:EDID HDMI VSDB has no extension fields
HDMI:EDID filtering formats with pixel clock > 162 MHz or h. blanking > 1023
HDMI:EDID best score mode initialised to CEA (1) 640x480p @ 60 Hz with pixel clock 25 MHz (score 0)
HDMI:EDID best score mode is now CEA (1) 640x480p @ 60 Hz with pixel clock 25 MHz (score 61864)
HDMI:EDID best score mode is now CEA (2) 720x480p @ 60 Hz with pixel clock 27 MHz (score 4566472)
HDMI:EDID CEA mode (3) 720x480p @ 60 Hz with pixel clock 27 MHz has a score of 66472
HDMI:EDID CEA mode (4) 1280x720p @ 60 Hz with pixel clock 74 MHz has a score of 4135592
HDMI:EDID DMT mode (4) 640x480p @ 60 Hz with pixel clock 25 MHz has a score of 18432
HDMI:EDID CEA mode (5) 1920x1080i @ 60 Hz with pixel clock 74 MHz has a score of 149416
HDMI:EDID DMT mode (5) 640x480p @ 72 Hz with pixel clock 31 MHz has a score of 22118
HDMI:EDID DMT mode (6) 640x480p @ 75 Hz with pixel clock 31 MHz has a score of 23040
HDMI:EDID DMT mode (8) 800x600p @ 56 Hz with pixel clock 36 MHz has a score of 26880
HDMI:EDID DMT mode (9) 800x600p @ 60 Hz with pixel clock 40 MHz has a score of 28800
HDMI:EDID DMT mode (10) 800x600p @ 72 Hz with pixel clock 50 MHz has a score of 34560
HDMI:EDID DMT mode (11) 800x600p @ 75 Hz with pixel clock 49 MHz has a score of 36000
HDMI:EDID CEA mode (14) 1440x480p @ 60 Hz with pixel clock 54 MHz has a score of 66472
HDMI:EDID CEA mode (15) 1440x480p @ 60 Hz with pixel clock 54 MHz has a score of 66472
HDMI:EDID best score mode is now CEA (16) 1920x1080p @ 60 Hz with pixel clock 148 MHz (score 5398248)
HDMI:EDID DMT mode (16) 1024x768p @ 60 Hz with pixel clock 65 MHz has a score of 47185
HDMI:EDID CEA mode (17) 720x576p @ 50 Hz with pixel clock 27 MHz has a score of 3066472
HDMI:EDID DMT mode (17) 1024x768p @ 70 Hz with pixel clock 75 MHz has a score of 55050
HDMI:EDID CEA mode (18) 720x576p @ 50 Hz with pixel clock 27 MHz has a score of 66472
HDMI:EDID DMT mode (18) 1024x768p @ 75 Hz with pixel clock 78 MHz has a score of 58982
HDMI:EDID CEA mode (19) 1280x720p @ 50 Hz with pixel clock 74 MHz has a score of 3617160
HDMI:EDID CEA mode (20) 1920x1080i @ 50 Hz with pixel clock 74 MHz has a score of 128680
HDMI:EDID DMT mode (21) 1152x864p @ 75 Hz with pixel clock 108 MHz has a score of 99649
HDMI:EDID CEA mode (29) 1440x576p @ 50 Hz with pixel clock 54 MHz has a score of 66472
HDMI:EDID CEA mode (30) 1440x576p @ 50 Hz with pixel clock 54 MHz has a score of 66472
HDMI:EDID CEA mode (31) 1920x1080p @ 50 Hz with pixel clock 148 MHz has a score of 232360
HDMI:EDID CEA mode (32) 1920x1080p @ 24 Hz with pixel clock 74 MHz has a score of 124532
HDMI:EDID DMT mode (32) 1280x960p @ 60 Hz with pixel clock 108 MHz has a score of 98728
HDMI:EDID CEA mode (34) 1920x1080p @ 30 Hz with pixel clock 74 MHz has a score of 149416
HDMI:EDID DMT mode (35) 1280x1024p @ 60 Hz with pixel clock 108 MHz has a score of 103643
HDMI:EDID DMT mode (36) 1280x1024p @ 75 Hz with pixel clock 135 MHz has a score of 98304
HDMI:EDID DMT mode (47) 1440x900p @ 60 Hz with pixel clock 106 MHz has a score of 102760
HDMI:EDID DMT mode (58) 1680x1050p @ 60 Hz with pixel clock 146 MHz has a score of 130840
HDMI:EDID DMT mode (82) 1920x1080p @ 60 Hz with pixel clock 148 MHz has a score of 149416
HDMI:EDID DMT mode (85) 1280x720p @ 60 Hz with pixel clock 74 MHz has a score of 80296
HDMI:EDID preferred mode remained as CEA (16) 1920x1080p @ 60 Hz with pixel clock 148 MHz
HDMI:EDID has HDMI support and audio support
edid_parser exited with code 0

I will try explicitly setting the resolution for the TV this Pi is going to be used on and reboot a few dozen times. Assuming that what I found is indeed what is causing the problem, it should work around the problem. I would much rather use the dynamic detection, as the Pi could be moved in the future, which would require that I manually detect & statically set the resolution for the new display. (The flag file code above is because some people are semi-blind & complain endlessly they cannot see if the resolution is greater than 1024x768 on some TVs; then they complain about the black bars on the sides of the wide-screen TV. I personally think this is ridiculous, and this is from somebody whose eyeglass correction is over 4 times what's considered legally blind.)

popcornmix commented 8 years ago

Okay, so it dosn't look like the firmware is locking up, but you have creates an rc.local script that is not exiting, which is preventing boot from completing. I assume that without the rc.local change it doesn't fail to boot?

I imagine it's most likely this while ! $( tvservice --dumpedid /tmp/edid | fgrep -qv 'Nothing written!' ) that is hanging. Either it is looping forever, or it is hanging in the tvservice call.

If you remove this code form rc.local, but instead run it as a script after booting, you may find that you can provoke the hang, and still be able to ssh in, or get debug information out.

Ruler2112 commented 8 years ago

I will attempt to re-create the freeze-up by manually running the tvservice command after booting with the execute bit removed from rc.local.

It has to be a hard lock by the command as if it were endlessly looping, 'Screen not connected - waiting...' would be printed to the screen for every loop iteration and it never shows up. (I suppose the display itself could be logically disconnected from the Pi somehow and it's just not being shown on the TV/monitor...)

Weirdest thing IMO is that all the other Pis (six) I have in place work fine with this same exact setup (software & version) and indeed have for months with never a freeze, on boot or otherwise. Really curious as to why the new batch has such an issue. Do you know if there were any minor hardware changes made recently?

Ruler2112 commented 8 years ago

I was able to reproduce the lock by running the tvservice EDID extraction command from a root command prompt. Once it locked, the keyboard went unresponsive and I could not switch to another session. Also, I was no longer able to ssh into the Pi, though it was working before the freeze.

Ruler2112 commented 8 years ago

I have tried 3 different HDMI cables as well.

popcornmix commented 8 years ago

I was able to reproduce the lock by running the tvservice EDID extraction command from a root command prompt.

Can you repeat this test using a different Pi (but same sdcard, TV and cables)? I doubt it is the specific Pi, but would be good to confirm.

Are the config.txt settings you listed earlier the only ones you use? I could imagine enabling something like i2c_vc may cause issues like this.

Ruler2112 commented 8 years ago

I'll do you one better. Reproduced the issue with the same sd card and different Pi, then used a card with a fresh install of Raspian from NOOBS 1.8.0 with nothing added or changed and reproduced it again.

My full /boot/config.txt on the card configured for the appointment board is:

disable_overscan=1
framebuffer_width=1900
framebuffer_height=1200
framebuffer_depth=32
framebuffer_ignore_alpha=1
hdmi_pixel_encoding=1
hdmi_group=2
start_debug=1

# For more options and information see
# http://www.raspberrypi.org/documentation/configuration/config-txt.md
# Some settings may impact device functionality. See link above for details

# uncomment if you get no picture on HDMI for a default "safe" mode
#hdmi_safe=1

# uncomment this if your display has a black border of unused pixels visible
# and your display can output without overscan
#disable_overscan=1

# uncomment the following to adjust overscan. Use positive numbers if console
# goes off screen, and negative if there is too much border
#overscan_left=16
#overscan_right=16
#overscan_top=16
#overscan_bottom=16

# uncomment to force a console size. By default it will be display's size minus
# overscan.
#framebuffer_width=1280
#framebuffer_height=720

# uncomment if hdmi display is not detected and composite is being output
hdmi_force_hotplug=1

# uncomment to force a specific HDMI mode (this will force VGA)
#hdmi_group=1
#hdmi_mode=1

# uncomment to force a HDMI mode rather than DVI. This can make audio work in
# DMT (computer monitor) modes
#hdmi_drive=2

# uncomment to increase signal to HDMI, if you have interference, blanking, or
# no display
#config_hdmi_boost=4

# uncomment for composite PAL
#sdtv_mode=2

#uncomment to overclock the arm. 700 MHz is the default.
#arm_freq=800

# Uncomment some or all of these to enable the optional hardware interfaces
#dtparam=i2c_arm=on
#dtparam=i2s=on
#dtparam=spi=on

# Uncomment this to enable the lirc-rpi module
#dtoverlay=lirc-rpi

# Additional overlays and parameters are documented /boot/overlays/README

# Enable audio (loads snd_bcm2835)
dtparam=audio=on

# NOOBS Auto-generated Settings:
hdmi_force_hotplug=1

The sd card with the fresh install has the default config.txt - I changed nothing.

One thing I found while doing all these tests is that it seems to lock with significantly more frequently when piping the output of the tvservice command to another process. I was able to get it to lock with just the tvservice --dumpedid /tmp/edid command, but it took significantly more tries (~75-80 versus ~15-20) than when piping the output to fgrep -qv 'Nothing written!' This could just be coincidence though - I don't know at this point - but I found it intriguing enough to mention in case it provides a clue.

Ruler2112 commented 8 years ago

Well that didn't paste very well... looked fine before I submitted it. How about:

http://pastebin.com/HKLvpqFj

Ruler2112 commented 8 years ago

Huh... submitted the followup & the original fixed itself - weird.

popcornmix commented 8 years ago

I added quotes (three backticks) around the quoted text to avoid github treating the # symbols as formatting directives which made it very hard to read.

popcornmix commented 8 years ago

I've tried using your config.txt and running tvservcie --dump-edid in a tight loop (run thousands of times so far) but I doubt that will fail. My guess is the display is not responding to I2C correctly when reading the EDID, hence the reason why you are having to read it in a loop. Now it is possible an earlier firmware behaved better in these circumstances, but it will be hard for me to debug it without being able to reproduce the failure.

If you can identify when the behaviour of the firmware changed, I can identify the commit and most likely fix this.

A workaround would be to follow this: http://kodi.wiki/view/Raspberry_Pi_FAQ#TV_is_not_detected_unless_powered_on_first and on the Pi with the problematic display use the EDID from a file rather than trying to read it every boot. I suspect that will avoid the hang.

Ruler2112 commented 8 years ago

I'm willing to test a limited number of firmwares, especially since I can now reproduce this in my office & don't have to wait days for it to fail in a public setting.

Could you point me to a brief overview of how to do this?

popcornmix commented 8 years ago

See: https://github.com/Hexxeh/rpi-firmware/commits/master

If you click on each commit the end of the url contains a git hash. Run sudo rpi-update <hash> to revert back to that version. Reboot and test for the problem.

Start with a firmware from say, a year ago. Report if this works or not. If it works, then you'll pick a firmware from about 6 months ago. If it fails try one from 2 years ago.

Ruler2112 commented 8 years ago

Ran sudo rpi-update 56569584e2a65d50408ea56d467336446af69925 to revert to the firmware from May 18 of 2015. Went through the process with no errors, but didn't restart; had the rainbow startup logo only about an inch square in the upper right corner. Power cycled it and got the same thing. Both LEDs on solid. Pi boots with another sd card.

Would I be correct in thinking that the update nuked the sd card & I have to start over installing the appointment board again? Or is there a way to recover? Regardless, how do I try older firmwares if they do stuff like this?

popcornmix commented 8 years ago

Is it a Pi3? Firmware older that the release data of the board is unlikely to work. Try using a Pi1 or Pi2.

Ruler2112 commented 8 years ago

It is a Pi 3; I have no 2s that are not in use and when I went to buy this latest batch of 3 Pis, could not get 2s. (I wanted 2s because I'd tested them thoroughly & didn't want to get into exactly this type of situation. :( )

popcornmix commented 8 years ago

You can download the firmware on (say) Windows. e.g. https://codeload.github.com/Hexxeh/rpi-firmware/legacy.zip/f58f8848c236cf3222e67ccb587d38daaa8ae036 will get you the firmware from Feb 25th which is around the launch date of Pi3.

The latest firmware would be: https://codeload.github.com/Hexxeh/rpi-firmware/legacy.zip/HEAD

If you open the zip file and copy the files (excluding the directories modules and vc) onto the sdcard (e.g. in windows) then you will probably be able to boot again.

Using a Pi2 for these tests would be easier.

Ruler2112 commented 8 years ago

Yeah - I slid the card into a reader and connected it to a linux box. Found that partition 6 contained the files from /boot, so I copied the contents of sdb6 from the clean-install card to the reverted one & it now boots as it did.

I was given The Look when I asked my boss if I could take down another display to troubleshoot this. (Really can't say I didn't expect this...)

At this point, I think I'm just going to code around the problem & deal with the negative aspects of doing so. :(

tghewett commented 8 years ago

I originally created this report.

I am possibly picking up cigarette ends regarding this conversation (I have not read it from start to end in detail) but (Ruler2112) if you are have crashing / hanging problems with a RP3 this very often is caused by power supply issues rather than anything else, especially when the CPU is loaded. I had this with a RP3 and used a more gutsy PSU which solved the problem.

Ruler2112 commented 8 years ago

Thanks for the input tghewett. I've tried three separate PSUs, all of which came with the Pi 3s. I can make it lock by running the tvservice --dumpedid command line, which is pretty clear that something in it is causing the problem. Plus, I've loaded the CPU harder than it is when simply pulling EDID information & it's been fine. (I wish it were just a PSU though!)

In your original post, you said older versions of firmware did not have this problem. Do you remember what version of the firmware you found did not have the problem and if it was on a Pi 2 or 3?

tghewett commented 8 years ago

Do you remember what version of the firmware you found did not have the problem and if it was on a Pi 2 or 3?

The firmware I was originally using was based on an Arch Linux download from raspberry.org, no longer listed there, albeit with the kernel and modules updated to support the more recent RP models. It still works fine with the newer RP0 and RP3 models so I'm not motivated to find the point at which the problem started, I don't need the newest firmware in this circumstance. The issue issue is reported just as a contribution to the RP community, it might have been easily found, maybe not. When the problem did happen it made no difference which RP model the firmware was running on.

BTW the whole computer was never hung, just BCM commands on the HDMI interface.

popcornmix commented 8 years ago

If the firmware works with Pi3 then it must be quite recent (from around February 2016).