Closed stschake closed 6 years ago
This is not a common failure - after a difficult development and some initial glitches the sdhost driver has been stable for at least a year now. There could well be bugs, but if so they must have been hiding well.
The log shows the last transfer (prior to this the interface was idle, periodically checking that the card is still present):
[13587.156458] [e97700ea] PRD< c2231240 0
[13587.162954] [e97700eb] PRD1 dea95b10 0
[13587.169461] [e97700f3] PRD2 a 0
[13587.175348] [e9770104] PRD3 cab66184 0
[13587.181863] [e9770105] PDM> c2231240 0
[13587.188327] [e9770106] REQ< c2231168 10801
[13587.195172] [e9770107] CMD< 17 50
[13587.201200] [e977010b] FCM< c2231168 c22311a4
[13587.208322] [e977010d] RSP 900 0
[13587.214378] [e977010e] CMD< 12 19c380
[13587.220771] [e9770110] CMDD 50 200
[13587.226933] [e9770110] SDMA c2231240 dea95b10
[13587.234098] [e9770112] FCM< c2231168 c22311d8
[13587.241258] [e9770113] RSP 900 0
[13587.247337] [e9770115] FCM> c2231168 0
[13587.253826] [e9770116] FCM> c2231168 0
[13587.260321] [e9770116] CMD 12 200
[13587.266457] [e9770116] REQ> c2231168 0
[13587.272942] [ea16f1e2] TIM< 0 0
It's a CMD23 (0x17 - Set block count) indicating an 80 sector transfer, followed by a CMD18 (0x12 - Read multiple blocks) starting at sector 0x19c380 (about 845MB from the start), preceded by the necessary DMA setup, but for some reason the DMA never completes so the transfer times out.
How are you powering the Pi3? Is the power supply capable of delivering 2.5A? Do you ever see the yellow lightning signal appear on the display? What does running vcgencmd get_throttled
report during typical activity?
We don't have the warning signs enabled since we use the full KMS. I have just tested with high CPU load for a good time and there was no undervoltage event observed. The power supplies are rated for 2.5A.
I'll see if I can get some more traces, unfortunately there sometimes seem to be IO related lockups that don't even trigger the timeout.
You say you've tried multiple Pi3s and multiple SD cards, so there must be something different about your environment otherwise lots of people would be reporting similar problems. Can you post your config.txt and cmdline.txt, and tell me a bit about the software you are running?
I did find one recent report of similar issues: https://patchwork.kernel.org/patch/10219145/ From what I can tell, the upstream sdhost driver mentioned there is identical to the downstream one, minus the extensive logging/tracing.
I've attached config/cmdline. The software is indeed a bit more special. While the kernel is rpi-4.9.y with a few choice cherry-picks from AOSP kernel common, in userland we run Android on a mesa3d v18 graphics stack.
That all looks reasonable, as do the two patches you referenced. I'll do something similar in the downstream version of the driver (with a hat-tip to the original poster).
I've got a patch which seems to work, although I've only been able to test it by faking a timeout. If I upload the patch somewhere will you be able to try it and test it?
Thanks, I'll test that. It is unfortunately difficult to reproduce reliably.
Any evidence of a single timeout from which it recovers would be great, but since the timeouts are basically fatal at the moment the required confidence level for merging is pretty low.
Just a note, I've started to experience this with a SD card (Samsung 16 EVO, exact same model as mentioned in the patchworks thread linked above) that was previously running wheezy on a Pi1B without any issues and re-written with latest Raspbian. Tested on both Pi2 B and an original Pi1, both fail. A pretty reliable way to reproduce is running badblocks over the card.
@stschake:
Are you able to reproduce this issue with the bcm2835.c instead of bcm2835-sdhost.c (they are more different than you think)?
@oniongarlic
Are you able to reproduce this issue with a different SD card than Samsung 16 EVO? Which parameters did you use for badblocks to reproduce the issue?
@stschake Did @pelwell patch help in any way or are you still seeing the issue?
@oniongarlic Any further comment?
Sorry, I haven't seen the issue since - but since the patch would only trigger on an actual timeout, I can't say either way.
I've seen some issues like this on Fedora, I had one specific RPi3 that would see similar style issues and went through a number of good quality (Sandisk Ultra and Samsung EVO) in the process but seems to have settled with 4.15+ (although maybe it's wishful thinking there). I wonder if the move of the MMC stack to MQ and some of the improvements there have helped this problem or maybe masked it in different ways?
@pelwell Thoughts on closing this?
OK with me. Closed issues are still searchable and can be re-opened as needed, but this been quiet for long enough.
FWIW, I have a similar error during boot on my Yocto linux image. First it gives the hardware interrupt timeout, then a similar trace of the last transaction, and periodically some error that a CPU is frozen. Since it does not boot, I'm not sure how to extract the relevant information.
What's peculiar is that the issue started occurring after adding a random Python recipe. And old Raspbian image on another SD card still works. The same Yocto image on a new SD card does not.
I could potentially upload the entire Yocto image if that would help reproducing the problem.
The preferred method is to a connected USB to serial adapter on the debug UART. But a photo of the beginning of the stacktrace would be okay.
Edit: Please open a new issue for this.
Also chiming in here - I'm seeing a similar issue during boot, but on an EspressoBin instead of a RaspberryPi. Different hardware true, but issues could be related. Here's the UART dump I get:
[ 25.495102] mmc0: Timeout waiting for hardware interrupt.
[ 25.500400] ------------[ cut here ]------------
[ 25.505381] WARNING: at drivers/mmc/host/sdhci.c:1014
[ 25.510694] Modules linked in:
[ 25.513671]
[ 25.515299] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.4.52-armada-17.10.4-gb3509f8 #1
[ 25.523219] Hardware name: EspressoBin, 256MB DDR3, Marvell 88E1512 PHY (DT)
[ 25.531319] task: ffffffc006900000 ti: ffffffc006908000 task.ti: ffffffc006908000
[ 25.538887] PC is at sdhci_send_command+0x6c0/0xa78
[ 25.543920] LR is at sdhci_finish_data+0xd0/0x2f0
[ 25.548957] pc : [<ffffffc0007e2110>] lr : [<ffffffc0007e2800>] pstate: 400001c5
[ 25.556602] sp : ffffffc00690bb20
[ 25.559935] x29: ffffffc00690bb20 x28: ffffffc006908000
[ 25.565517] x27: ffffffc000dae000 x26: ffffffc00bfd7bb8
[ 25.571010] x25: ffffffc000e88ea0 x24: ffffffc006908000
[ 25.576504] x23: ffffffc006183130 x22: 0000000000000002
[ 25.581998] x21: 0000000000000003 x20: ffffffc006183170
[ 25.587494] x19: ffffffc00616b500 x18: 0000000000000000
[ 25.592988] x17: 0000007f7e20f940 x16: ffffffc0001fec68
[ 25.598484] x15: 0000000000000010 x14: ffffffc080ea017f
[ 25.603979] x13: ffffffc000ea018d x12: 0000000000000006
[ 25.609473] x11: 0000000000000006 x10: 0000000005f5e0ff
[ 25.614970] x9 : 0000000000000150 x8 : 7265746e69206572
[ 25.620463] x7 : 6177647261682072 x6 : 0000000006c00000
[ 25.625958] x5 : ffffffc0000997f0 x4 : ffffffc000ebfe08
[ 25.631451] x3 : 0000000000000002 x2 : ffffffc0007e61c8
[ 25.636948] x1 : ffffffc006183130 x0 : ffffffc0061830f0
[ 25.642441]
[ 25.643972] ---[ end trace d7ad8c474e1fcc82 ]---
[ 25.648922] Call trace:
[ 25.651272] [<ffffffc0007e2110>] sdhci_send_command+0x6c0/0xa78
[ 25.657483] [<ffffffc0007e2800>] sdhci_finish_data+0xd0/0x2f0
[ 25.663514] [<ffffffc0007e2bec>] sdhci_timeout_timer+0x84/0xd8
[ 25.669379] [<ffffffc00011e148>] call_timer_fn+0x38/0x1c8
[ 25.674689] [<ffffffc00011e4dc>] run_timer_softirq+0x204/0x270
[ 25.681081] [<ffffffc0000c80d0>] __do_softirq+0x128/0x360
[ 25.686749] [<ffffffc0000c85c8>] irq_exit+0x98/0xf0
[ 25.691708] [<ffffffc00010b020>] __handle_domain_irq+0x60/0xb8
[ 25.697827] [<ffffffc0000827c4>] gic_handle_irq+0xbc/0x168
[ 25.703237] Exception stack(0xffffffc00690bde0 to 0xffffffc00690bf10)
[ 25.709898] bde0: ffffffc000daf000 0000007fffffffff ffffffc00690bf40 ffffffc000086848
[ 25.717905] be00: 0000000000000145 ffffffc000d85ad8 0000000000000000 0000000000000000
[ 25.726365] be20: 0000000000000001 0000000000000000 00000000000001c0 0100000000000000
[ 25.734475] be40: 0000000000000000 ffffffc00bfdd040 ffffffc0069007c0 ffffffc00690bed0
[ 25.742226] be60: 0000000000000760 0000000000000000 0000000000000000 0000000000091b92
[ 25.750326] be80: 0000000000000000 0000000000000000 ffffffc0001fec68 0000007f7e20f940
[ 25.758427] bea0: 00000000000007b1 ffffffc000daf000 ffffffc000dafae8 ffffffc0009bd370
[ 25.766886] bec0: 0000000000000000 0000000000000000 ffffffc000d85ad8 ffffffc000e89000
[ 25.774816] bee0: ffffffc0009bd000 ffffffc00690bf70 ffffffc006908000 ffffffc00690bf40
[ 25.783003] bf00: ffffffc000086844 ffffffc00690bf40
[ 25.787869] [<ffffffc000085700>] el1_irq+0x80/0xf8
[ 25.792732] [<ffffffc000100524>] default_idle_call+0x1c/0x38
[ 25.798669] [<ffffffc000100880>] cpu_startup_entry+0x2f0/0x340
[ 25.804970] [<ffffffc00009014c>] secondary_start_kernel+0x12c/0x150
[ 25.811367] [<00000000000829dc>] 0x829dc
[ 35.511102] mmc0: Timeout waiting for hardware interrupt.
[ 36.495096] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 36.500799] 0-...: (1 GPs behind) idle=969/140000000000001/0 softirq=2203/2208 fqs=5170
[ 36.509158] (detected by 1, t=5252 jiffies, g=537, c=536, q=672)
[ 36.515372] Task dump for CPU 0:
[ 36.518701] stats_collector R running task 0 1691 1679 0x00000002
[ 36.525993] Call trace:
[ 36.528615] [<ffffffc000086e28>] __switch_to+0x88/0xa0
[ 36.534008] [<ffffffc00010b01c>] __handle_domain_irq+0x5c/0xb8
[ 36.540217] [<ffffffc0000827c4>] gic_handle_irq+0xbc/0x168
[ 36.545895] Exception stack(0xffffffc004bafe90 to 0xffffffc004baffc0)
[ 36.552563] fe80: 0000000000000000 0000007fb2185972
[ 36.560402] fea0: ffffffffffffffff 0000007fb22033f4 0000000060000000 0000007fb2181110
[ 36.568594] fec0: 0000007fb2185972 0000007fb2181110 0000000000000000 0000000000000001
[ 36.576607] fee0: 0000000000000004 00000000004040b0 00000000000001b5 0000000000406570
[ 36.584889] ff00: 0000000000000022 000000000000001b 0101010101010101 0000007fb21f7b40
[ 36.592903] ff20: 0000007fb1c19da8 0000007fb221a9a8 0000007fb2227000 0000007fb1c07414
[ 36.600825] ff40: 0000007fb2228000 0000007fb220ff00 0000000000040434 00000000004017b8
[ 36.609101] ff60: 0000007fb2185972 0000007fccd5e098 0000000000406570 0000007fb2229190
[ 36.617290] ff80: 0000007fb2181110 00000000000001b5 0000000000000004 0000000000000000
[ 36.625303] ffa0: 0000007fccd5e094 0000007fccd5df50 0000007fb22038b0 0000007fccd5df50
[ 36.633583] [<ffffffc000085c18>] el0_irq_naked+0x1c/0x24
[ 45.527102] mmc0: Timeout waiting for hardware interrupt.
[ 45.532989] mmcblk0: error -110 sending status command, retrying
[ 55.543102] mmc0: Timeout waiting for hardware interrupt.
[ 55.548713] mmcblk0: error -110 sending status command, retrying
[ 65.559103] mmc0: Timeout waiting for hardware interrupt.
[ 65.564732] mmcblk0: error -110 sending status command, aborting
[ 65.570637] blk_update_request: I/O error, dev mmcblk0, sector 606500
[ 65.577424] blk_update_request: I/O error, dev mmcblk0, sector 606508
[ 65.584295] blk_update_request: I/O error, dev mmcblk0, sector 606516
I also have this timeout message on a Raspberry 3B with Raspbian Buster. Don't know if it is related.
Originally, I tried to find out why my bitcoind
process is terminating after some time: https://bitcoin.stackexchange.com/questions/91582/where-can-i-find-a-usefull-log-if-bitcoind-stopped-for-no-apparent-reason
Seeing simular errors with a Raspberrypi 4b and original Pi Powersupply. Pi has no SD Card, only m.2 SSD on USB3 port with Beta Firmware from 6/15/20. Maybe an eletrical problem? Missing pullup/down? Plugged a dead SDCard and the errors are gone. `` [ 5098.170186] mmc0: Timeout waiting for hardware cmd interrupt. [ 5098.170194] mmc0: sdhci: ============ SDHCI REGISTER DUMP =========== [ 5098.170202] mmc0: sdhci: Sys addr: 0x00000000 | Version: 0x00001002 [ 5098.170206] mmc0: sdhci: Blk size: 0x00000000 | Blk cnt: 0x00000000 [ 5098.170210] mmc0: sdhci: Argument: 0x00000000 | Trn mode: 0x00000000 [ 5098.170214] mmc0: sdhci: Present: 0x1fff0001 | Host ctl: 0x00000001 [ 5098.170218] mmc0: sdhci: Power: 0x0000000f | Blk gap: 0x00000080 [ 5098.170222] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x0000f447 [ 5098.170226] mmc0: sdhci: Timeout: 0x00000000 | Int stat: 0x00000000 [ 5098.170229] mmc0: sdhci: Int enab: 0x00ff1003 | Sig enab: 0x00ff1003 [ 5098.170233] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000 [ 5098.170237] mmc0: sdhci: Caps: 0x45ee6432 | Caps_1: 0x0000a525 [ 5098.170241] mmc0: sdhci: Cmd: 0x00000502 | Max curr: 0x00080008
[ 5098.170251] mmc0: sdhci: Host ctl2: 0x00000000 [ 5098.170255] mmc0: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x00000000 [ 5098.170258] mmc0: sdhci: ============================================ [ 5108.410373] mmc0: Timeout waiting for hardware cmd interrupt. [ 5108.410380] mmc0: sdhci: ============ SDHCI REGISTER DUMP =========== [ 5108.410387] mmc0: sdhci: Sys addr: 0x00000000 | Version: 0x00001002 [ 5108.410391] mmc0: sdhci: Blk size: 0x00000000 | Blk cnt: 0x00000000 [ 5108.410395] mmc0: sdhci: Argument: 0x00000000 | Trn mode: 0x00000000 [ 5108.410399] mmc0: sdhci: Present: 0x1fff0001 | Host ctl: 0x00000001 [ 5108.410403] mmc0: sdhci: Power: 0x0000000f | Blk gap: 0x00000080 [ 5108.410407] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x0000f447 [ 5108.410411] mmc0: sdhci: Timeout: 0x00000000 | Int stat: 0x00000000 [ 5108.410414] mmc0: sdhci: Int enab: 0x00ff1003 | Sig enab: 0x00ff1003 [ 5108.410418] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000 [ 5108.410422] mmc0: sdhci: Caps: 0x45ee6432 | Caps_1: 0x0000a525 [ 5108.410426] mmc0: sdhci: Cmd: 0x0000371a | Max curr: 0x00080008
[ 5108.410436] mmc0: sdhci: Host ctl2: 0x00000000 [ 5108.410440] mmc0: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x00000000 [ 5108.410443] mmc0: sdhci: ============================================ [ 5119.930605] mmc0: Timeout waiting for hardware cmd interrupt. [ 5119.930612] mmc0: sdhci: ============ SDHCI REGISTER DUMP =========== [ 5119.930620] mmc0: sdhci: Sys addr: 0x00000000 | Version: 0x00001002 [ 5119.930624] mmc0: sdhci: Blk size: 0x00000000 | Blk cnt: 0x00000000 [ 5119.930628] mmc0: sdhci: Argument: 0x00000000 | Trn mode: 0x00000000 [ 5119.930632] mmc0: sdhci: Present: 0x1fff0001 | Host ctl: 0x00000001 [ 5119.930636] mmc0: sdhci: Power: 0x0000000f | Blk gap: 0x00000080 [ 5119.930640] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x0000f447 [ 5119.930643] mmc0: sdhci: Timeout: 0x00000000 | Int stat: 0x00000000 [ 5119.930647] mmc0: sdhci: Int enab: 0x00ff1003 | Sig enab: 0x00ff1003 [ 5119.930651] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000 [ 5119.930655] mmc0: sdhci: Caps: 0x45ee6432 | Caps_1: 0x0000a525 [ 5119.930659] mmc0: sdhci: Cmd: 0x00000502 | Max curr: 0x00080008
[ 5119.930669] mmc0: sdhci: Host ctl2: 0x00000000 [ 5119.930673] mmc0: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x00000000 [ 5119.930677] mmc0: sdhci: ============================================ [ 5131.450797] mmc0: Timeout waiting for hardware cmd interrupt. [ 5131.450803] mmc0: sdhci: ============ SDHCI REGISTER DUMP =========== [ 5131.450809] mmc0: sdhci: Sys addr: 0x00000000 | Version: 0x00001002 [ 5131.450813] mmc0: sdhci: Blk size: 0x00000000 | Blk cnt: 0x00000000 [ 5131.450817] mmc0: sdhci: Argument: 0x00000000 | Trn mode: 0x00000000 [ 5131.450821] mmc0: sdhci: Present: 0x1fff0001 | Host ctl: 0x00000001 [ 5131.450825] mmc0: sdhci: Power: 0x0000000f | Blk gap: 0x00000080 [ 5131.450829] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x0000f447 [ 5131.450832] mmc0: sdhci: Timeout: 0x00000000 | Int stat: 0x00000000 [ 5131.450836] mmc0: sdhci: Int enab: 0x00ff1003 | Sig enab: 0x00ff1003 [ 5131.450840] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000 [ 5131.450844] mmc0: sdhci: Caps: 0x45ee6432 | Caps_1: 0x0000a525 [ 5131.450847] mmc0: sdhci: Cmd: 0x00000502 | Max curr: 0x00080008
[ 5131.450857] mmc0: sdhci: Host ctl2: 0x00000000 [ 5131.450861] mmc0: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x00000000 [ 5131.450864] mmc0: sdhci: ============================================ [ 5141.690999] mmc0: Timeout waiting for hardware cmd interrupt. [ 5141.691006] mmc0: sdhci: ============ SDHCI REGISTER DUMP =========== [ 5141.691013] mmc0: sdhci: Sys addr: 0x00000000 | Version: 0x00001002 [ 5141.691017] mmc0: sdhci: Blk size: 0x00000000 | Blk cnt: 0x00000000 [ 5141.691021] mmc0: sdhci: Argument: 0x00000000 | Trn mode: 0x00000000 [ 5141.691024] mmc0: sdhci: Present: 0x1fff0001 | Host ctl: 0x00000001 [ 5141.691028] mmc0: sdhci: Power: 0x0000000f | Blk gap: 0x00000080 [ 5141.691032] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x0000f447 [ 5141.691035] mmc0: sdhci: Timeout: 0x00000000 | Int stat: 0x00000000 [ 5141.691039] mmc0: sdhci: Int enab: 0x00ff1003 | Sig enab: 0x00ff1003 [ 5141.691043] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000 [ 5141.691046] mmc0: sdhci: Caps: 0x45ee6432 | Caps_1: 0x0000a525 [ 5141.691050] mmc0: sdhci: Cmd: 0x0000371a | Max curr: 0x00080008
[ 5141.691060] mmc0: sdhci: Host ctl2: 0x00000000 [ 5141.691064] mmc0: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x00000000 [ 5141.691067] mmc0: sdhci: ============================================ [ 5158.331311] mmc0: Timeout waiting for hardware cmd interrupt. [ 5158.331322] mmc0: sdhci: ============ SDHCI REGISTER DUMP =========== [ 5158.331335] mmc0: sdhci: Sys addr: 0x00000000 | Version: 0x00001002 [ 5158.331344] mmc0: sdhci: Blk size: 0x00000000 | Blk cnt: 0x00000000 [ 5158.331354] mmc0: sdhci: Argument: 0x80000c08 | Trn mode: 0x00000000 [ 5158.331363] mmc0: sdhci: Present: 0x1fff0001 | Host ctl: 0x00000001 [ 5158.331372] mmc0: sdhci: Power: 0x0000000f | Blk gap: 0x00000080 [ 5158.331381] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x0000f447 [ 5158.331390] mmc0: sdhci: Timeout: 0x00000000 | Int stat: 0x00000000 [ 5158.331399] mmc0: sdhci: Int enab: 0x00ff1003 | Sig enab: 0x00ff1003 [ 5158.331408] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000 [ 5158.331417] mmc0: sdhci: Caps: 0x45ee6432 | Caps_1: 0x0000a525 [ 5158.331426] mmc0: sdhci: Cmd: 0x0000341a | Max curr: 0x00080008
[ 5158.331452] mmc0: sdhci: Host ctl2: 0x00000000 [ 5158.331461] mmc0: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x00000000 [ 5158.331468] mmc0: sdhci: ============================================ [ 5202.492126] mmc0: Timeout waiting for hardware cmd interrupt. [ 5202.492137] mmc0: sdhci: ============ SDHCI REGISTER DUMP =========== [ 5202.492150] mmc0: sdhci: Sys addr: 0x00000000 | Version: 0x00001002 [ 5202.492160] mmc0: sdhci: Blk size: 0x00000000 | Blk cnt: 0x00000000 [ 5202.492169] mmc0: sdhci: Argument: 0x00000000 | Trn mode: 0x00000000 [ 5202.492178] mmc0: sdhci: Present: 0x1fff0001 | Host ctl: 0x00000001 [ 5202.492187] mmc0: sdhci: Power: 0x0000000f | Blk gap: 0x00000080 [ 5202.492197] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x0000f447 [ 5202.492206] mmc0: sdhci: Timeout: 0x00000000 | Int stat: 0x00000000 [ 5202.492215] mmc0: sdhci: Int enab: 0x00ff1003 | Sig enab: 0x00ff1003 [ 5202.492223] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000 [ 5202.492232] mmc0: sdhci: Caps: 0x45ee6432 | Caps_1: 0x0000a525 [ 5202.492241] mmc0: sdhci: Cmd: 0x0000371a | Max curr: 0x00080008
[ 5202.492267] mmc0: sdhci: Host ctl2: 0x00000000 [ 5202.492276] mmc0: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x00000000 [ 5202.492283] mmc0: sdhci: ============================================ [ 5222.332496] mmc0: Timeout waiting for hardware cmd interrupt. [ 5222.332507] mmc0: sdhci: ============ SDHCI REGISTER DUMP =========== [ 5222.332519] mmc0: sdhci: Sys addr: 0x00000000 | Version: 0x00001002 [ 5222.332529] mmc0: sdhci: Blk size: 0x00000000 | Blk cnt: 0x00000000 [ 5222.332538] mmc0: sdhci: Argument: 0x00000000 | Trn mode: 0x00000000 [ 5222.332548] mmc0: sdhci: Present: 0x1fff0001 | Host ctl: 0x00000001 [ 5222.332557] mmc0: sdhci: Power: 0x0000000f | Blk gap: 0x00000080 [ 5222.332567] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x0000f447 [ 5222.332576] mmc0: sdhci: Timeout: 0x00000000 | Int stat: 0x00000000 [ 5222.332585] mmc0: sdhci: Int enab: 0x00ff1003 | Sig enab: 0x00ff1003 [ 5222.332594] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000 [ 5222.332603] mmc0: sdhci: Caps: 0x45ee6432 | Caps_1: 0x0000a525 [ 5222.332612] mmc0: sdhci: Cmd: 0x00000502 | Max curr: 0x00080008
[ 5222.332637] mmc0: sdhci: Host ctl2: 0x00000000 [ 5222.332647] mmc0: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x00000000 [ 5222.332654] mmc0: sdhci: ============================================ [ 5262.653235] mmc0: Timeout waiting for hardware cmd interrupt. [ 5262.653245] mmc0: sdhci: ============ SDHCI REGISTER DUMP =========== [ 5262.653256] mmc0: sdhci: Sys addr: 0x00000000 | Version: 0x00001002 [ 5262.653264] mmc0: sdhci: Blk size: 0x00000000 | Blk cnt: 0x00000000 [ 5262.653271] mmc0: sdhci: Argument: 0x00000000 | Trn mode: 0x00000000 [ 5262.653279] mmc0: sdhci: Present: 0x1fff0001 | Host ctl: 0x00000001 [ 5262.653286] mmc0: sdhci: Power: 0x0000000f | Blk gap: 0x00000080 [ 5262.653294] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x0000f447 [ 5262.653301] mmc0: sdhci: Timeout: 0x00000000 | Int stat: 0x00000000 [ 5262.653308] mmc0: sdhci: Int enab: 0x00ff1003 | Sig enab: 0x00ff1003 [ 5262.653315] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000 [ 5262.653322] mmc0: sdhci: Caps: 0x45ee6432 | Caps_1: 0x0000a525 [ 5262.653330] mmc0: sdhci: Cmd: 0x0000371a | Max curr: 0x00080008
[ 5262.653350] mmc0: sdhci: Host ctl2: 0x00000000 [ 5262.653357] mmc0: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x00000000 [ 5262.653363] mmc0: sdhci: ============================================ [ 5279.933559] mmc0: Timeout waiting for hardware cmd interrupt. [ 5279.933565] mmc0: sdhci: ============ SDHCI REGISTER DUMP =========== [ 5279.933571] mmc0: sdhci: Sys addr: 0x00000000 | Version: 0x00001002 [ 5279.933575] mmc0: sdhci: Blk size: 0x00000000 | Blk cnt: 0x00000000 [ 5279.933578] mmc0: sdhci: Argument: 0x00000000 | Trn mode: 0x00000000 [ 5279.933582] mmc0: sdhci: Present: 0x1fff0001 | Host ctl: 0x00000001 [ 5279.933586] mmc0: sdhci: Power: 0x0000000f | Blk gap: 0x00000080 [ 5279.933590] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x0000f447 [ 5279.933594] mmc0: sdhci: Timeout: 0x00000000 | Int stat: 0x00000000 [ 5279.933597] mmc0: sdhci: Int enab: 0x00ff1003 | Sig enab: 0x00ff1003 [ 5279.933601] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000 [ 5279.933605] mmc0: sdhci: Caps: 0x45ee6432 | Caps_1: 0x0000a525 [ 5279.933609] mmc0: sdhci: Cmd: 0x00000502 | Max curr: 0x00080008
[ 5279.933620] mmc0: sdhci: Host ctl2: 0x00000000 [ 5279.933623] mmc0: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x00000000 [ 5279.933627] mmc0: sdhci: ============================================ [ 5290.173725] mmc0: Timeout waiting for hardware cmd interrupt. [ 5290.173731] mmc0: sdhci: ============ SDHCI REGISTER DUMP =========== [ 5290.173737] mmc0: sdhci: Sys addr: 0x00000000 | Version: 0x00001002 [ 5290.173741] mmc0: sdhci: Blk size: 0x00000000 | Blk cnt: 0x00000000 [ 5290.173745] mmc0: sdhci: Argument: 0x00000000 | Trn mode: 0x00000000 [ 5290.173749] mmc0: sdhci: Present: 0x1fff0001 | Host ctl: 0x00000001 [ 5290.173752] mmc0: sdhci: Power: 0x0000000f | Blk gap: 0x00000080 [ 5290.173756] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x0000f447 [ 5290.173760] mmc0: sdhci: Timeout: 0x00000000 | Int stat: 0x00000000 [ 5290.173763] mmc0: sdhci: Int enab: 0x00ff1003 | Sig enab: 0x00ff1003 [ 5290.173767] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000 [ 5290.173771] mmc0: sdhci: Caps: 0x45ee6432 | Caps_1: 0x0000a525 [ 5290.173775] mmc0: sdhci: Cmd: 0x0000371a | Max curr: 0x00080008
[ 5290.173785] mmc0: sdhci: Host ctl2: 0x00000000 [ 5290.173788] mmc0: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x00000000 [ 5290.173792] mmc0: sdhci: ============================================ [ 5300.413934] mmc0: Timeout waiting for hardware cmd interrupt. [ 5300.413940] mmc0: sdhci: ============ SDHCI REGISTER DUMP =========== [ 5300.413946] mmc0: sdhci: Sys addr: 0x00000000 | Version: 0x00001002 [ 5300.413950] mmc0: sdhci: Blk size: 0x00000000 | Blk cnt: 0x00000000 [ 5300.413953] mmc0: sdhci: Argument: 0x00000000 | Trn mode: 0x00000000 [ 5300.413957] mmc0: sdhci: Present: 0x1fff0001 | Host ctl: 0x00000001 [ 5300.413961] mmc0: sdhci: Power: 0x0000000f | Blk gap: 0x00000080 [ 5300.413965] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x0000f447 [ 5300.413968] mmc0: sdhci: Timeout: 0x00000000 | Int stat: 0x00000000 [ 5300.413972] mmc0: sdhci: Int enab: 0x00ff1003 | Sig enab: 0x00ff1003 [ 5300.413975] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000 [ 5300.413979] mmc0: sdhci: Caps: 0x45ee6432 | Caps_1: 0x0000a525 [ 5300.413983] mmc0: sdhci: Cmd: 0x0000371a | Max curr: 0x00080008
[ 5300.413993] mmc0: sdhci: Host ctl2: 0x00000000 [ 5300.413997] mmc0: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x00000000 [ 5300.414000] mmc0: sdhci: ============================================ [ 5311.934140] mmc0: Timeout waiting for hardware cmd interrupt. [ 5311.934146] mmc0: sdhci: ============ SDHCI REGISTER DUMP =========== [ 5311.934152] mmc0: sdhci: Sys addr: 0x00000000 | Version: 0x00001002 [ 5311.934156] mmc0: sdhci: Blk size: 0x00000000 | Blk cnt: 0x00000000 [ 5311.934160] mmc0: sdhci: Argument: 0x80000c08 | Trn mode: 0x00000000 [ 5311.934164] mmc0: sdhci: Present: 0x1fff0001 | Host ctl: 0x00000001 [ 5311.934168] mmc0: sdhci: Power: 0x0000000f | Blk gap: 0x00000080 [ 5311.934172] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x0000f447 [ 5311.934176] mmc0: sdhci: Timeout: 0x00000000 | Int stat: 0x00000000 [ 5311.934179] mmc0: sdhci: Int enab: 0x00ff1003 | Sig enab: 0x00ff1003 [ 5311.934183] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000 [ 5311.934187] mmc0: sdhci: Caps: 0x45ee6432 | Caps_1: 0x0000a525 [ 5311.934191] mmc0: sdhci: Cmd: 0x0000341a | Max curr: 0x00080008
[ 5311.934202] mmc0: sdhci: Host ctl2: 0x00000000 [ 5311.934206] mmc0: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x00000000 [ 5311.934209] mmc0: sdhci: ============================================ [ 5322.184322] mmc0: Timeout waiting for hardware cmd interrupt. [ 5322.184329] mmc0: sdhci: ============ SDHCI REGISTER DUMP =========== [ 5322.184336] mmc0: sdhci: Sys addr: 0x00000000 | Version: 0x00001002 [ 5322.184341] mmc0: sdhci: Blk size: 0x00000000 | Blk cnt: 0x00000000 [ 5322.184345] mmc0: sdhci: Argument: 0x00000000 | Trn mode: 0x00000000 [ 5322.184349] mmc0: sdhci: Present: 0x1fff0001 | Host ctl: 0x00000001 [ 5322.184353] mmc0: sdhci: Power: 0x0000000f | Blk gap: 0x00000080 [ 5322.184357] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x0000f447 [ 5322.184361] mmc0: sdhci: Timeout: 0x00000000 | Int stat: 0x00000000 [ 5322.184364] mmc0: sdhci: Int enab: 0x00ff1003 | Sig enab: 0x00ff1003 [ 5322.184368] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000 [ 5322.184372] mmc0: sdhci: Caps: 0x45ee6432 | Caps_1: 0x0000a525 [ 5322.184376] mmc0: sdhci: Cmd: 0x00000502 | Max curr: 0x00080008
[ 5322.184386] mmc0: sdhci: Host ctl2: 0x00000000 [ 5322.184390] mmc0: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x00000000 [ 5322.184393] mmc0: sdhci: ============================================ [ 5332.414494] mmc0: Timeout waiting for hardware cmd interrupt. [ 5332.414501] mmc0: sdhci: ============ SDHCI REGISTER DUMP =========== [ 5332.414507] mmc0: sdhci: Sys addr: 0x00000000 | Version: 0x00001002 [ 5332.414511] mmc0: sdhci: Blk size: 0x00000000 | Blk cnt: 0x00000000 [ 5332.414515] mmc0: sdhci: Argument: 0x00000000 | Trn mode: 0x00000000 [ 5332.414519] mmc0: sdhci: Present: 0x1fff0001 | Host ctl: 0x00000001 [ 5332.414523] mmc0: sdhci: Power: 0x0000000f | Blk gap: 0x00000080 [ 5332.414527] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x0000f447 [ 5332.414530] mmc0: sdhci: Timeout: 0x00000000 | Int stat: 0x00000000 [ 5332.414534] mmc0: sdhci: Int enab: 0x00ff1003 | Sig enab: 0x00ff1003 [ 5332.414538] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000 [ 5332.414541] mmc0: sdhci: Caps: 0x45ee6432 | Caps_1: 0x0000a525 [ 5332.414545] mmc0: sdhci: Cmd: 0x0000371a | Max curr: 0x00080008
[ 5332.414555] mmc0: sdhci: Host ctl2: 0x00000000 [ 5332.414559] mmc0: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x00000000 [ 5332.414562] mmc0: sdhci: ============================================ [ 5342.654688] mmc0: Timeout waiting for hardware cmd interrupt. [ 5342.654693] mmc0: sdhci: ============ SDHCI REGISTER DUMP =========== [ 5342.654699] mmc0: sdhci: Sys addr: 0x00000000 | Version: 0x00001002 [ 5342.654703] mmc0: sdhci: Blk size: 0x00000000 | Blk cnt: 0x00000000 [ 5342.654706] mmc0: sdhci: Argument: 0x00000000 | Trn mode: 0x00000000 [ 5342.654710] mmc0: sdhci: Present: 0x1fff0001 | Host ctl: 0x00000001 [ 5342.654714] mmc0: sdhci: Power: 0x0000000f | Blk gap: 0x00000080 [ 5342.654718] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x0000f447 [ 5342.654722] mmc0: sdhci: Timeout: 0x00000000 | Int stat: 0x00000000 [ 5342.654726] mmc0: sdhci: Int enab: 0x00ff1003 | Sig enab: 0x00ff1003 [ 5342.654729] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000 [ 5342.654733] mmc0: sdhci: Caps: 0x45ee6432 | Caps_1: 0x0000a525 [ 5342.654737] mmc0: sdhci: Cmd: 0x0000371a | Max curr: 0x00080008
[ 5342.654748] mmc0: sdhci: Host ctl2: 0x00000000 [ 5342.654751] mmc0: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x00000000 [ 5342.654754] mmc0: sdhci: ============================================ [ 5354.174892] mmc0: Timeout waiting for hardware cmd interrupt. [ 5354.174904] mmc0: sdhci: ============ SDHCI REGISTER DUMP =========== [ 5354.174917] mmc0: sdhci: Sys addr: 0x00000000 | Version: 0x00001002 [ 5354.174927] mmc0: sdhci: Blk size: 0x00000000 | Blk cnt: 0x00000000 [ 5354.174936] mmc0: sdhci: Argument: 0x00000000 | Trn mode: 0x00000000 [ 5354.174945] mmc0: sdhci: Present: 0x1fff0001 | Host ctl: 0x00000001 [ 5354.174954] mmc0: sdhci: Power: 0x0000000f | Blk gap: 0x00000080 [ 5354.174963] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x0000f447 [ 5354.174972] mmc0: sdhci: Timeout: 0x00000000 | Int stat: 0x00000000 [ 5354.174982] mmc0: sdhci: Int enab: 0x00ff1003 | Sig enab: 0x00ff1003 [ 5354.174990] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000 [ 5354.175000] mmc0: sdhci: Caps: 0x45ee6432 | Caps_1: 0x0000a525 [ 5354.175009] mmc0: sdhci: Cmd: 0x00000502 | Max curr: 0x00080008
[ 5354.175034] mmc0: sdhci: Host ctl2: 0x00000000 [ 5354.175044] mmc0: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x00000000 [ 5354.175051] mmc0: sdhci: ============================================ [ 5390.655542] mmc0: Timeout waiting for hardware cmd interrupt. [ 5390.655553] mmc0: sdhci: ============ SDHCI REGISTER DUMP =========== [ 5390.655566] mmc0: sdhci: Sys addr: 0x00000000 | Version: 0x00001002 [ 5390.655576] mmc0: sdhci: Blk size: 0x00000000 | Blk cnt: 0x00000000 [ 5390.655585] mmc0: sdhci: Argument: 0x00000000 | Trn mode: 0x00000000 [ 5390.655595] mmc0: sdhci: Present: 0x1fff0001 | Host ctl: 0x00000001 [ 5390.655605] mmc0: sdhci: Power: 0x0000000f | Blk gap: 0x00000080 [ 5390.655614] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x0000f447 [ 5390.655623] mmc0: sdhci: Timeout: 0x00000000 | Int stat: 0x00000000 [ 5390.655632] mmc0: sdhci: Int enab: 0x00ff1003 | Sig enab: 0x00ff1003 [ 5390.655641] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000 [ 5390.655651] mmc0: sdhci: Caps: 0x45ee6432 | Caps_1: 0x0000a525 [ 5390.655660] mmc0: sdhci: Cmd: 0x00000502 | Max curr: 0x00080008
[ 5390.655685] mmc0: sdhci: Host ctl2: 0x00000000 [ 5390.655694] mmc0: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x00000000 [ 5390.655701] mmc0: sdhci: ============================================ `
Without an SD card it's best to put dtparam=sd_poll_once
in config.txt - that will prevent (or minimise) the errors
Without an SD card it's best to put
dtparam=sd_poll_once
in config.txt - that will prevent (or minimise) the errors
Thanks a lot :-), that fixed it.
Just got two of them as well, 10 minutes after boot on RPi 2 v1.1 (firmware #1343
, kernel 5.4.68-v7+
) with 10 seconds between them, but not leading to any issues as far as I can tell now. While other dmesg
pastes here show different outputs, also related to non SD card being attached, my errors match pretty the OP ones (but without the final error -110
) and happen with system on SD card:
Got them again, after longer uptime:
This time I had set dtparam=sd_overclock=100
, which is why the firmware reduced it back to default in two steps, actually a nice feature. However in last post above this was NOT
overclocked, so I don't expect it as the origin of the issue, as well since the overclocked SD frequency is something that worked nicely for the 5 years I run this RPi 24/7 😉.
I think the difficulty is that this error is very unspecific, is it? Related to the fact that I never saw this on any other firmware/kernel version (see above), probably someone has an idea where it might come from and how to debug? At least I would like to stay with overclocked mmc (which increases transfer rates significantly) and the SD card definitely supports it.
I have the same issue with the eMMC on multiple compute modules. There's no SD card. For some reason the timeout log and register dumps take more than 1s in a softIRQ, running with IRQs disabled. I have included the output from the sdhost driver from dmesg here.
In my case it seems to have been solved after enabling uio_pdrv_genirq
kernel module. I was in attempt to find out which are required for what and without this it worked stable for a year until one of the Linux 5.4 releases, so I was not thinking about it anymore. But for some reason it seems to be required now. It "should" load by default, i.e. I blacklisted it before to prevent its load.
I have the same issue with the eMMC on multiple compute modules. There's no SD card. For some reason the timeout log and register dumps take more than 1s in a softIRQ, running with IRQs disabled. I have included the output from the sdhost driver from dmesg here.
My issue seems to have been a hardware issue instead of a driver issue. Our controller board had the raspberry pi compute module IO board copied on it wholesale. Instead of connecting a jumper to the connector for selecting the bootmode, we had a switch there, connected via a ~20cm wire. This wire probably picked up some noise during operation, switching the EMMC_DISABLE_N signal, causing issues with this driver. My work-around is outlined here: https://www.raspberrypi.org/forums/viewtopic.php?f=98&t=289726
Just reporting, after running for several YEARS.
Many many like: 2023-09-15 00:11:00 pi93graf kern.err kernel: [ 8518.855078] print_req_error: I/O error, dev mmcblk0, sector 42880405 2023-09-15 00:11:10 pi93graf kern.err kernel: [ 8529.092917] mmc0: timeout waiting for hardware interrupt. 2023-09-15 00:11:10 pi93graf kern.err kernel: [ 8529.095121] print_req_error: I/O error, dev mmcblk0, sector 42880406 ... 2023-09-15 00:11:21 pi93graf kern.err kernel: [ 8539.335215] print_req_error: I/O error, dev mmcblk0, sector 42880407 2023-09-15 00:11:30 pi93graf daemon.err systemd[1]: systemd-journald.service: Watchdog timeout (limit 3min)!
Also from influxdb:
2023-09-15 08:09:23 pi93graf user.info influxd-systemd-start.sh[2501]: ts=2023-09-15T12:09:23.344486Z lvl=info msg="Compacting file" log_id=0kHjJEDG000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0kIBmEWl000 op_name=tsm1_compact_group db_shard_id=2066 tsm1_index=1 tsm1_file=/varnew/lib/influxdb/data/ruuvi/autogen/2066/000000002-000000001.tsm
2023-09-15 08:09:31 pi93graf user.info influxd-systemd-start.sh[2501]: ts=2023-09-15T12:09:31.579042Z lvl=info msg="Error replacing new TSM files" log_id=0kHjJEDG000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0kIBmEWl000 op_name=tsm1_compact_group db_shard_id=2066 error="cannot allocate memory"
This is happening now on my up-to-date Raspberry Pi Zero 2.
Because this has happened before and I could not see any reason, i left a couple logs open in screen.
from kernel.log: [seems normal up to here] Nov 9 20:09:19 troll systemd[1]: Removed slice User Slice of UID 0. Nov 9 20:09:46 troll kernel: [213338.858612] mmc0: timeout waiting for hardware interrupt. Nov 9 20:09:46 troll kernel: [213338.858635] [6b9d8a9c] PRD1 81785840 0 Nov 9 20:09:46 troll kernel: [213338.858653] [6b9d8a9f] PRD2 2 0
dnsmasq.log: [all normal up to here] Nov 10 01:47:28 dnsmasq-dhcp[428]: failed to write /var/lib/misc/dnsmasq.leases: Read-only file system (retry in 60 s) [repeated many times over subsequent hours; yes, it keeps serving DNS]
It's hard to troubleshoot after the fact because no commands or files can be found on the filesystem, ls won't work, etc. top continues to run and says that RAM is reasonably available, CPU mostly idle, etc.
root@troll:/var/log# ls bash: /bin/ls: Input/output error
Any suggestions welcome. I just wanted to confirm that this can still happen fairly frequently on a Pi with the current OS and updated firmware.
I will add more info if I learn anything else.
In my case I believe it to be a couple of bad locations on the SD !
Hey,
we've been seeing the above error randomly occur with Raspberry Pi 3s. SD card manufacturer or new/old doesn't seem to make any difference. I've attached the full message log for the error.
Is this a known issue that can be mitigated somehow? We can't use the alternate SD card controller since we need the WLAN connectivity.
mmc_timeout.txt