thesofproject / sof-test

BSD 3-Clause "New" or "Revised" License
12 stars 44 forks source link

[BUG] filter out ATA read error on sh-kbl-nuc-hda-01 #893

Closed plbossart closed 2 months ago

plbossart commented 2 years ago

test 11887?model=KBL_NUC_HDA_IPC4&testcase=check-suspend-resume-50

[ 4187.483800] kernel: ata3.00: Read log 0x30 page 0x00 failed, Emask 0x40
marc-hb commented 2 years ago

Here's the complete log.

We need the hard drive to work reliably, it's not optional. I don't think we should ignore a failing hard drive, we should change it.

sh-kbl-nuc-hda-01:~# journalctl --facility=kern -p 3 -g '^ata'

-- Logs begin at Tue 2021-11-09 09:03:21 UTC, end at Wed 2022-04-20 17:58:31 UTC. --
-- Reboot --
-- Reboot --
-- Reboot --
Nov 09 09:52:13 kblnuc kernel: ata3.00: Read log page 0x08 failed, Emask 0x40
-- Reboot --
-- Reboot --
-- Reboot --
...
-- Reboot --
-- Reboot --
Nov 10 08:58:42 kblnuc kernel: ata3.00: Read log page 0x08 failed, Emask 0x40
Nov 10 08:58:43 kblnuc kernel: ata3.00: exception Emask 0x0 SAct 0x3fe00 SErr 0x0 action 0x0
Nov 10 08:58:43 kblnuc kernel: ata3.00: irq_stat 0x40000008
Nov 10 08:58:43 kblnuc kernel: ata3.00: failed command: READ FPDMA QUEUED
Nov 10 08:58:43 kblnuc kernel: ata3.00: cmd 60/d8:48:28:08:00/01:00:00:00:00/40 tag 9 ncq dma 241664 in
                                        res 41/40:d8:28:08:00/00:01:00:00:00/40 Emask 0x409 (media error) <F>
Nov 10 08:58:43 kblnuc kernel: ata3.00: status: { DRDY ERR }
Nov 10 08:58:43 kblnuc kernel: ata3.00: error: { UNC }
-- Reboot --
-- Reboot --
...
-- Reboot --
-- Reboot --
Nov 15 03:16:37 kblnuc kernel: ata3.00: Read log page 0x08 failed, Emask 0x40
-- Reboot --
-- Reboot --
-- Reboot --
-- Reboot --
-- Reboot --
-- Reboot --
Nov 15 06:03:22 kblnuc kernel: ata3.00: Read log page 0x08 failed, Emask 0x40
Nov 15 06:03:22 kblnuc kernel: ata3.00: failed to set xfermode (err_mask=0x40)
-- Reboot --
-- Reboot --
-- Reboot --
...
-- Reboot --
-- Reboot --
-- Reboot --
-- Reboot --
-- Reboot --
-- Reboot --
Apr 14 21:40:42 sh-kbl-nuc-hda-01 kernel: ata3.00: Read log 0x30 page 0x00 failed, Emask 0x40
-- Reboot --
-- Reboot --
...
-- Reboot --
-- Reboot --
Apr 15 11:45:00 sh-kbl-nuc-hda-01 kernel: ata3.00: Read log 0x30 page 0x00 failed, Emask 0x40
Apr 15 11:45:01 sh-kbl-nuc-hda-01 kernel: ata3.00: exception Emask 0x0 SAct 0xe001ff SErr 0x0 action 0x0
Apr 15 11:45:01 sh-kbl-nuc-hda-01 kernel: ata3.00: irq_stat 0x40000008
Apr 15 11:45:01 sh-kbl-nuc-hda-01 kernel: ata3.00: failed command: READ FPDMA QUEUED
Apr 15 11:45:01 sh-kbl-nuc-hda-01 kernel: ata3.00: cmd 60/80:00:80:08:00/01:00:00:00:00/40 tag 0 ncq dma 196608 in
                                                   res 41/40:80:80:08:00/00:01:00:00:00/40 Emask 0x409 (media error) <F>
Apr 15 11:45:01 sh-kbl-nuc-hda-01 kernel: ata3.00: status: { DRDY ERR }
Apr 15 11:45:01 sh-kbl-nuc-hda-01 kernel: ata3.00: error: { UNC }
Apr 15 11:45:01 sh-kbl-nuc-hda-01 kernel: ata3.00: configured for UDMA/133
Apr 15 11:45:01 sh-kbl-nuc-hda-01 kernel: sd 2:0:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
Apr 15 11:45:01 sh-kbl-nuc-hda-01 kernel: sd 2:0:0:0: [sda] tag#0 Sense Key : Medium Error [current] 
Apr 15 11:45:01 sh-kbl-nuc-hda-01 kernel: sd 2:0:0:0: [sda] tag#0 Add. Sense: Unrecovered read error - auto reallocate failed
Apr 15 11:45:01 sh-kbl-nuc-hda-01 kernel: sd 2:0:0:0: [sda] tag#0 CDB: Read(10) 28 00 00 00 08 80 00 01 80 00
Apr 15 11:45:01 sh-kbl-nuc-hda-01 kernel: I/O error, dev sda, sector 2176 op 0x0:(READ) flags 0x80700 phys_seg 43 prio class 0
Apr 15 11:45:01 sh-kbl-nuc-hda-01 kernel: ata3: EH complete

-- Reboot --
-- Reboot --
Apr 15 13:21:30 sh-kbl-nuc-hda-01 kernel: ata3.00: Read log 0x30 page 0x00 failed, Emask 0x40
-- Reboot --
-- Reboot --
-- Reboot --
Apr 15 23:38:47 sh-kbl-nuc-hda-01 kernel: ata3.00: Read log 0x30 page 0x08 failed, Emask 0x40
-- Reboot --
-- Reboot --
-- Reboot --
Apr 17 22:30:05 sh-kbl-nuc-hda-01 kernel: ata3.00: Read log 0x30 page 0x00 failed, Emask 0x40
-- Reboot --
-- Reboot --
Apr 18 23:40:58 sh-kbl-nuc-hda-01 kernel: ata3.00: Read log 0x30 page 0x00 failed, Emask 0x40
-- Reboot --
-- Reboot --
-- Reboot --
Apr 19 23:43:21 sh-kbl-nuc-hda-01 kernel: ata3.00: Read log 0x30 page 0x00 failed, Emask 0x40
-- Reboot --
Apr 19 23:44:52 sh-kbl-nuc-hda-01 kernel: ata3.00: Read log 0x30 page 0x00 failed, Emask 0x40
-- Reboot --
Apr 19 23:45:45 sh-kbl-nuc-hda-01 kernel: ata3.00: Read log 0x30 page 0x00 failed, Emask 0x40
-- Reboot --
-- Reboot --
plbossart commented 2 years ago

I don't think anyone will be able to replace that disk short-term @marc-hb, and it doesn't really seem to have any impact on our results, does it?

marc-hb commented 2 years ago

it doesn't really seem to have any impact on our results, does it?

Hard drives never heal themselves. Actually, they do perform error correction and block re-allocation up to the point they can't anymore and then things go downhill relatively quickly. If we ignore this then we will forget about it and waste considerable time chasing other, red herring errors later. So let's keep this error as a warning NOT to trust results on this device and as a reminder to change the hard drive as soon as we can. This error is NOT turning every test red so test results are still useful even with this error. In fact it's not even happening on every test run (yet).

Also, it would be awkward to ignore this error on this device only (cause we don't want to miss hard drive errors on other systems)

marc-hb commented 2 years ago

BTW another failing hard drive on another device: https://sof-ci.01.org/sofpr/PR5705/build47/devicetest/?model=BYT_MB_NOCODEC&testcase=verify-kernel-boot-log

This one started failing systematically, I will disable it.

journalctl --facility=kern -p 3 -g '^ata'

Apr 21 05:01:26 sh-byt-mb-nocodec-05 kernel: ata1.00: irq_stat 0x40000008
Apr 21 05:01:26 sh-byt-mb-nocodec-05 kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Apr 21 05:01:26 sh-byt-mb-nocodec-05 kernel: ata1.00: cmd 61/20:70:68:b7:13/00:00:00:00:00/40 tag 14 ncq dma 16384 out
                                                      res 51/04:20:68:b7:13/00:00:00:00:00/40 Emask 0x401 (device error) <F>
Apr 21 05:01:26 sh-byt-mb-nocodec-05 kernel: ata1.00: status: { DRDY ERR }
Apr 21 05:01:26 sh-byt-mb-nocodec-05 kernel: ata1.00: error: { ABRT }
Apr 21 05:01:26 sh-byt-mb-nocodec-05 kernel: ata1.00: exception Emask 0x0 SAct 0xffffffff SErr 0x0 action 0x0
Apr 21 05:01:26 sh-byt-mb-nocodec-05 kernel: ata1.00: irq_stat 0x40000008
Apr 21 05:01:26 sh-byt-mb-nocodec-05 kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Apr 21 05:01:26 sh-byt-mb-nocodec-05 kernel: ata1.00: cmd 61/10:20:30:c1:13/00:00:00:00:00/40 tag 4 ncq dma 8192 out
                                                      res 51/04:10:30:c1:13/00:00:00:00:00/40 Emask 0x401 (device error) <F>
Apr 21 05:01:26 sh-byt-mb-nocodec-05 kernel: ata1.00: status: { DRDY ERR }
Apr 21 05:01:26 sh-byt-mb-nocodec-05 kernel: ata1.00: error: { ABRT }
-- Reboot --
Apr 21 05:08:32 sh-byt-mb-nocodec-05 kernel: ata1.00: exception Emask 0x0 SAct 0xffffffff SErr 0x0 action 0x0
Apr 21 05:08:32 sh-byt-mb-nocodec-05 kernel: ata1.00: irq_stat 0x40000008
Apr 21 05:08:32 sh-byt-mb-nocodec-05 kernel: ata1.00: failed command: READ FPDMA QUEUED
Apr 21 05:08:32 sh-byt-mb-nocodec-05 kernel: ata1.00: cmd 60/08:e8:b8:84:48/00:00:00:00:00/40 tag 29 ncq dma 4096 in
                                                      res 51/04:08:b8:84:48/00:00:00:00:00/40 Emask 0x401 (device error) <F>
Apr 21 05:08:32 sh-byt-mb-nocodec-05 kernel: ata1.00: status: { DRDY ERR }
Apr 21 05:08:32 sh-byt-mb-nocodec-05 kernel: ata1.00: error: { ABRT }
Apr 21 05:08:45 sh-byt-mb-nocodec-05 kernel: ata1.00: exception Emask 0x0 SAct 0xffffffff SErr 0x0 action 0x0
Apr 21 05:08:45 sh-byt-mb-nocodec-05 kernel: ata1.00: irq_stat 0x40000008

Still failing. Only once at boot?

May 13 04:42:36 sh-byt-mb-nocodec-05 kernel: ata1.00: error: { ABRT }
May 13 04:42:36 sh-byt-mb-nocodec-05 kernel: ata1.00: exception Emask 0x0 SAct 0xffffffff SErr 0x0 action 0x0
May 13 04:42:36 sh-byt-mb-nocodec-05 kernel: ata1.00: irq_stat 0x40000008
May 13 04:42:36 sh-byt-mb-nocodec-05 kernel: ata1.00: failed command: WRITE FPDMA QUEUED
May 13 04:42:36 sh-byt-mb-nocodec-05 kernel: ata1.00: cmd 61/08:f8:e0:8b:91/00:00:01:00:00/40 tag 31 ncq dma 4096 out
                                                      res 51/04:08:e0:8b:91/00:01:00:00:00/40 Emask 0x401 (device error) <F>
May 13 04:42:36 sh-byt-mb-nocodec-05 kernel: ata1.00: status: { DRDY ERR }
May 13 04:42:36 sh-byt-mb-nocodec-05 kernel: ata1.00: error: { ABRT }
-- Reboot --
May 14 04:54:39 sh-byt-mb-nocodec-05 kernel: ata1.00: exception Emask 0x0 SAct 0xffffffff SErr 0x0 action 0x0
May 14 04:54:39 sh-byt-mb-nocodec-05 kernel: ata1.00: irq_stat 0x40000008
May 14 04:54:39 sh-byt-mb-nocodec-05 kernel: ata1.00: failed command: WRITE FPDMA QUEUED
May 14 04:54:39 sh-byt-mb-nocodec-05 kernel: ata1.00: cmd 61/08:90:a0:11:e4/00:00:00:00:00/40 tag 18 ncq dma 4096 out
                                                      res 51/04:08:a0:11:e4/00:00:00:00:00/40 Emask 0x401 (device error) <F>
May 14 04:54:39 sh-byt-mb-nocodec-05 kernel: ata1.00: status: { DRDY ERR }
May 14 04:54:39 sh-byt-mb-nocodec-05 kernel: ata1.00: error: { ABRT }
May 14 04:54:55 sh-byt-mb-nocodec-05 kernel: ata1.00: exception Emask 0x0 SAct 0xffffffff SErr 0x0 action 0x0
May 14 04:54:55 sh-byt-mb-nocodec-05 kernel: ata1.00: irq_stat 0x40000008
May 14 04:54:55 sh-byt-mb-nocodec-05 kernel: ata1.00: failed command: READ FPDMA QUEUED
May 14 04:54:55 sh-byt-mb-nocodec-05 kernel: ata1.00: cmd 60/08:20:b8:84:48/00:00:00:00:00/40 tag 4 ncq dma 4096 in
                                                      res 51/04:08:b8:84:48/00:00:00:00:00/40 Emask 0x401 (device error) <F>
May 14 04:54:55 sh-byt-mb-nocodec-05 kernel: ata1.00: status: { DRDY ERR }
May 14 04:54:55 sh-byt-mb-nocodec-05 kernel: ata1.00: error: { ABRT }
May 14 04:55:12 sh-byt-mb-nocodec-05 kernel: ata1.00: exception Emask 0x0 SAct 0xffffffff SErr 0x0 action 0x0
May 14 04:55:12 sh-byt-mb-nocodec-05 kernel: ata1.00: irq_stat 0x40000008
May 14 04:55:12 sh-byt-mb-nocodec-05 kernel: ata1.00: failed command: WRITE FPDMA QUEUED
May 14 04:55:12 sh-byt-mb-nocodec-05 kernel: ata1.00: cmd 61/30:e8:c8:1f:92/00:00:01:00:00/40 tag 29 ncq dma 24576 out
                                                      res 51/04:30:c8:1f:92/00:01:00:00:00/40 Emask 0x401 (device error) <F>
May 14 04:55:12 sh-byt-mb-nocodec-05 kernel: ata1.00: status: { DRDY ERR }
May 14 04:55:12 sh-byt-mb-nocodec-05 kernel: ata1.00: error: { ABRT }
May 14 04:55:12 sh-byt-mb-nocodec-05 kernel: ata1.00: exception Emask 0x0 SAct 0xffffffff SErr 0x0 action 0x0
May 14 04:55:12 sh-byt-mb-nocodec-05 kernel: ata1.00: irq_stat 0x40000008
May 14 04:55:12 sh-byt-mb-nocodec-05 kernel: ata1.00: failed command: WRITE FPDMA QUEUED
May 14 04:55:12 sh-byt-mb-nocodec-05 kernel: ata1.00: cmd 61/10:e8:60:28:92/00:00:01:00:00/40 tag 29 ncq dma 8192 out
                                                      res 51/04:10:60:28:92/00:01:00:00:00/40 Emask 0x401 (device error) <F>
May 14 04:55:12 sh-byt-mb-nocodec-05 kernel: ata1.00: status: { DRDY ERR }
May 14 04:55:12 sh-byt-mb-nocodec-05 kernel: ata1.00: error: { ABRT }
-- Reboot --
May 14 05:04:05 sh-byt-mb-nocodec-05 kernel: ata1.00: exception Emask 0x0 SAct 0xffffffff SErr 0x0 action 0x0
May 14 05:04:05 sh-byt-mb-nocodec-05 kernel: ata1.00: irq_stat 0x40000008
May 14 05:04:05 sh-byt-mb-nocodec-05 kernel: ata1.00: failed command: READ FPDMA QUEUED
May 14 05:04:05 sh-byt-mb-nocodec-05 kernel: ata1.00: cmd 60/08:90:40:60:45/00:00:00:00:00/40 tag 18 ncq dma 4096 in
                                                      res 51/04:08:40:60:45/00:00:00:00:00/40 Emask 0x401 (device error) <F>
May 14 05:04:05 sh-byt-mb-nocodec-05 kernel: ata1.00: status: { DRDY ERR }
May 14 05:04:05 sh-byt-mb-nocodec-05 kernel: ata1.00: error: { ABRT }
May 14 05:04:05 sh-byt-mb-nocodec-05 kernel: ata1.00: exception Emask 0x0 SAct 0xffffffff SErr 0x0 action 0x0
May 14 05:04:05 sh-byt-mb-nocodec-05 kernel: ata1.00: irq_stat 0x40000008
May 14 05:04:05 sh-byt-mb-nocodec-05 kernel: ata1.00: failed command: READ FPDMA QUEUED
May 14 05:04:05 sh-byt-mb-nocodec-05 kernel: ata1.00: cmd 60/08:f0:b8:52:48/00:00:00:00:00/40 tag 30 ncq dma 4096 in
                                                      res 51/04:08:b8:52:48/00:00:00:00:00/40 Emask 0x401 (device error) <F>
May 14 05:04:05 sh-byt-mb-nocodec-05 kernel: ata1.00: status: { DRDY ERR }
May 14 05:04:05 sh-byt-mb-nocodec-05 kernel: ata1.00: error: { ABRT }
May 14 05:04:17 sh-byt-mb-nocodec-05 kernel: ata1.00: exception Emask 0x0 SAct 0xffffffff SErr 0x0 action 0x0
May 14 05:04:17 sh-byt-mb-nocodec-05 kernel: ata1.00: irq_stat 0x40000008
May 14 05:04:17 sh-byt-mb-nocodec-05 kernel: ata1.00: failed command: WRITE FPDMA QUEUED
May 14 05:04:17 sh-byt-mb-nocodec-05 kernel: ata1.00: cmd 61/30:f8:c0:1e:92/00:00:01:00:00/40 tag 31 ncq dma 24576 out
                                                      res 51/04:30:c0:1e:92/00:01:00:00:00/40 Emask 0x401 (device error) <F>
May 14 05:04:17 sh-byt-mb-nocodec-05 kernel: ata1.00: status: { DRDY ERR }
May 14 05:04:17 sh-byt-mb-nocodec-05 kernel: ata1.00: error: { ABRT }
May 14 05:04:17 sh-byt-mb-nocodec-05 kernel: ata1.00: exception Emask 0x0 SAct 0xffffffff SErr 0x0 action 0x0
May 14 05:04:17 sh-byt-mb-nocodec-05 kernel: ata1.00: irq_stat 0x40000008
May 14 05:04:17 sh-byt-mb-nocodec-05 kernel: ata1.00: failed command: WRITE FPDMA QUEUED
May 14 05:04:17 sh-byt-mb-nocodec-05 kernel: ata1.00: cmd 61/30:f8:c0:1e:92/00:00:01:00:00/40 tag 31 ncq dma 24576 out
                                                      res 51/04:30:c0:1e:92/00:01:00:00:00/40 Emask 0x401 (device error) <F>
May 14 05:04:17 sh-byt-mb-nocodec-05 kernel: ata1.00: status: { DRDY ERR }
May 14 05:04:17 sh-byt-mb-nocodec-05 kernel: ata1.00: error: { ABRT }
lines 4882-4974/4974 (END)
XiaoyunWu6666 commented 2 years ago

hi @marc-hb , is there any updates for this ata error on KBL NUC ? Could you share if any updates? Thanks.

marc-hb commented 2 years ago

Still failing

-- Boot 8ceb97503fc64b2997f44c931d61dc90 --
-- Boot d9ac25a2a72145b29472e0243d432623 --
May 12 23:46:09 sh-kbl-nuc-hda-01 kernel: ata3.00: Read log 0x30 page 0x00 failed, Emask 0x40
-- Boot a4b4f0283b8d4653b3d7e28860e5cebc --
-- Boot cd5a6f3d37774aec9a3a384ee1f24b2d --
-- Boot c487c997f9aa45c1b658a0c558bc5439 --
May 13 23:44:17 sh-kbl-nuc-hda-01 kernel: ata3.00: Read log 0x30 page 0x00 failed, Emask 0x40
-- Boot 38303f9efc62479ebb766fe3df4d4d86 --
May 13 23:46:14 sh-kbl-nuc-hda-01 kernel: ata3.00: Read log 0x30 page 0x00 failed, Emask 0x40
-- Boot 9d517f17d9be4281bfe9b44397b9d0ff --
-- Boot b95fb25a07ef42128e2428858ab149ae --
May 13 23:48:38 sh-kbl-nuc-hda-01 kernel: ata3.00: Read log 0x00 page 0x00 failed, Emask 0x40
-- Boot 4843a27fa958475c87453861512b63d9 --
May 15 23:42:45 sh-kbl-nuc-hda-01 kernel: ata3.00: Read log 0x30 page 0x00 failed, Emask 0x40
-- Boot 1ebbcc576c3f47b9851406b088fc60aa --
-- Boot 109b015f7d0b4deaa634604ccb9593fa --
-- Boot 879a7c9acbc44846ba6055d0589cf541 --
-- Boot 85c54ee188ff47f1b71f85534419ac92 --
May 16 02:00:16 sh-kbl-nuc-hda-01 kernel: ata3.00: exception Emask 0x0 SAct 0xffc8 SErr 0x0 action 0x0
May 16 02:00:16 sh-kbl-nuc-hda-01 kernel: ata3.00: irq_stat 0x40000008
May 16 02:00:16 sh-kbl-nuc-hda-01 kernel: ata3.00: failed command: READ FPDMA QUEUED
May 16 02:00:16 sh-kbl-nuc-hda-01 kernel: ata3.00: cmd 60/d8:30:28:08:00/01:00:00:00:00/40 tag 6 ncq dma 241664 in
                                                   res 41/40:d8:28:08:00/00:01:00:00:00/40 Emask 0x409 (media error) <F>
May 16 02:00:16 sh-kbl-nuc-hda-01 kernel: ata3.00: status: { DRDY ERR }
May 16 02:00:16 sh-kbl-nuc-hda-01 kernel: ata3.00: error: { UNC }
-- Boot 05aa558c447d490d87585905f806a355 --
May 16 05:59:25 sh-kbl-nuc-hda-01 kernel: ata3.00: Read log 0x30 page 0x08 failed, Emask 0x40
-- Boot 7d8a2bb94f4e45e0b45ffe67f25134ea --
-- Boot de20750576594430b44a33bdbc90b8ee --
May 16 06:39:18 sh-kbl-nuc-hda-01 kernel: ata3.00: Read log 0x30 page 0x08 failed, Emask 0x40
-- Boot cb56ba97baed45a5b39e165c94a7831e --
-- Boot 23ac572b17854967939557f46b590676 --
May 16 23:49:04 sh-kbl-nuc-hda-01 kernel: ata3.00: Read log 0x30 page 0x00 failed, Emask 0x40
-- Boot 6bd7119223354f73a4e4bd4a4d80de4b --
May 16 23:51:18 sh-kbl-nuc-hda-01 kernel: ata3.00: Read log 0x30 page 0x00 failed, Emask 0x40
-- Boot 8cd78342655841a0b7c0de841a3259cf --
-- Boot bb6457fea8bf45f0990b0e02bf8989dc --
-- Boot dbaae1d5501f4abdb76b58543573e447 --
May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: ata3.00: Read log 0x30 page 0x00 failed, Emask 0x40
-- Boot 07eb5e3800d34df296ad1a36863cdfa3 --
May 17 23:45:31 sh-kbl-nuc-hda-01 kernel: ata3.00: Read log 0x30 page 0x08 failed, Emask 0x40
-- Boot b7aa84d5a752426c805e9b43e17f20c4 --
-- Boot 989a1da13a0942ddaa6ff6d343f72fbf --

Context of the last failure, it was on resume:

``` May 17 23:41:52 sh-kbl-nuc-hda-01 kernel: sof-audio-pci-intel-skl 0000:00:1f.3: ipc tx done : 0x47000000|0x0: MOD_SET_DX [data size: 8] May 17 23:41:52 sh-kbl-nuc-hda-01 kernel: sof-audio-pci-intel-skl 0000:00:1f.3: FW Poll Status: reg[0x4]=0x1010303 successful May 17 23:41:52 sh-kbl-nuc-hda-01 kernel: sof-audio-pci-intel-skl 0000:00:1f.3: FW Poll Status: reg[0x4]=0x303 successful May 17 23:41:52 sh-kbl-nuc-hda-01 kernel: sof-audio-pci-intel-skl 0000:00:1f.3: DSP core(s) enabled? 0 : core_mask 3 May 17 23:41:52 sh-kbl-nuc-hda-01 kernel: sof-audio-pci-intel-skl 0000:00:1f.3: Debug PCIR: 00000000 at 00000044 May 17 23:41:52 sh-kbl-nuc-hda-01 kernel: sof-audio-pci-intel-skl 0000:00:1f.3: Debug PCIW: 00000010 at 00000044 May 17 23:41:52 sh-kbl-nuc-hda-01 kernel: sof-audio-pci-intel-skl 0000:00:1f.3: Turning i915 HDAC power 0 May 17 23:41:52 sh-kbl-nuc-hda-01 kernel: sof-audio-pci-intel-skl 0000:00:1f.3: Current DSP power state: D3 May 17 23:41:52 sh-kbl-nuc-hda-01 kernel: sof-audio-pci-intel-skl 0000:00:1f.3: fw_state change: 6 -> 0 May 17 23:41:55 sh-kbl-nuc-hda-01 sudo[72487]: ubuntu : PWD=/home/ubuntu ; USER=root ; COMMAND=/usr/bin/env PATH=/home/ubuntu/sof-test/tools:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin rtcwake -m mem -s 5 May 17 23:41:55 sh-kbl-nuc-hda-01 sudo[72487]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000) May 17 23:41:55 sh-kbl-nuc-hda-01 kernel: PM: suspend entry (deep) May 17 23:41:55 sh-kbl-nuc-hda-01 kernel: Filesystems sync: 0.001 seconds May 17 23:41:55 sh-kbl-nuc-hda-01 /usr/libexec/gdm-x-session[677]: (**) Option "fd" "46" May 17 23:41:55 sh-kbl-nuc-hda-01 /usr/libexec/gdm-x-session[677]: (II) event0 - Sleep Button: device removed May 17 23:41:55 sh-kbl-nuc-hda-01 /usr/libexec/gdm-x-session[677]: (II) AIGLX: Suspending AIGLX clients for VT switch May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: Freezing user space processes ... (elapsed 0.005 seconds) done. May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: OOM killer disabled. May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: Freezing remaining freezable tasks ... (elapsed 0.002 seconds) done. May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: printk: Suspending console(s) (use no_console_suspend to debug) May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: e1000e: EEE TX LPI TIMER: 00000011 May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: sd 2:0:0:0: [sda] Synchronizing SCSI cache May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: sd 2:0:0:0: [sda] Stopping disk May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: sof-audio-pci-intel-skl 0000:00:1f.3: Turning i915 HDAC power 1 May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: sof-audio-pci-intel-skl 0000:00:1f.3: Debug PCIR: 00000000 at 00000044 May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: sof-audio-pci-intel-skl 0000:00:1f.3: Debug PCIR: 00000040 at 00000048 May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: sof-audio-pci-intel-skl 0000:00:1f.3: Debug PCIW: 00000000 at 00000048 May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: sof-audio-pci-intel-skl 0000:00:1f.3: Debug PCIR: 00000010 at 00000044 May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: sof-audio-pci-intel-skl 0000:00:1f.3: Turning i915 HDAC power 0 May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: sof-audio-pci-intel-skl 0000:00:1f.3: Current DSP power state: D3 May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: sof-audio-pci-intel-skl 0000:00:1f.3: fw_state change: 6 -> 0 May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: ACPI: EC: interrupt blocked May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: ACPI: PM: Preparing to enter system sleep state S3 May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: ACPI: EC: event blocked May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: ACPI: EC: EC stopped May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: ACPI: PM: Saving platform NVS memory May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: Disabling non-boot CPUs ... May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: smpboot: CPU 1 is now offline May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: smpboot: CPU 2 is now offline May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: smpboot: CPU 3 is now offline May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: ACPI: PM: Low-level resume complete May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: ACPI: EC: EC started May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: ACPI: PM: Restoring platform NVS memory May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: Enabling non-boot CPUs ... May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: x86: Booting SMP configuration: May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: smpboot: Booting Node 0 Processor 1 APIC 0x2 May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: ACPI: \_PR_.CPU1: Found 3 idle states May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: CPU1 is up May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: smpboot: Booting Node 0 Processor 2 APIC 0x1 May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: ACPI: \_PR_.CPU2: Found 3 idle states May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: CPU2 is up May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: smpboot: Booting Node 0 Processor 3 APIC 0x3 May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: ACPI: \_PR_.CPU3: Found 3 idle states May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: CPU3 is up May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: ACPI: PM: Waking up from system sleep state S3 May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: ACPI: EC: interrupt unblocked May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: ACPI: EC: event unblocked May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: sd 2:0:0:0: [sda] Starting disk May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: i915 0000:00:02.0: [drm] [ENCODER:94:DDI B/PHY B] is disabled/in DSI mode with an ungated DDI clock, gate it May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: i915 0000:00:02.0: [drm] [ENCODER:106:DDI C/PHY C] is disabled/in DSI mode with an ungated DDI clock, gate it May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: sof-audio-pci-intel-skl 0000:00:1f.3: Turning i915 HDAC power 1 May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: sof-audio-pci-intel-skl 0000:00:1f.3: Debug PCIR: 00000000 at 00000044 May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: sof-audio-pci-intel-skl 0000:00:1f.3: Debug PCIR: 00000040 at 00000048 May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: sof-audio-pci-intel-skl 0000:00:1f.3: Debug PCIW: 00000000 at 00000048 May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: sof-audio-pci-intel-skl 0000:00:1f.3: Debug PCIR: 00000000 at 00000048 ... May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: sof-audio-pci-intel-skl 0000:00:1f.3: Debug PCIR: 00000004 at 00000044 May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: sof-audio-pci-intel-skl 0000:00:1f.3: Debug PCIW: 00000000 at 00000044 May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: sof-audio-pci-intel-skl 0000:00:1f.3: firmware boot complete May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: sof-audio-pci-intel-skl 0000:00:1f.3: fw_state change: 5 -> 6 May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: ata1: SATA link down (SStatus 4 SControl 300) May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300) May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: ata3.00: Read log 0x30 page 0x00 failed, Emask 0x40 May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: ata3.00: configured for UDMA/133 May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300) May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: ata3.00: configured for UDMA/133 May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: OOM killer enabled. May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: Restarting tasks ... done. May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: video LNXVIDEO:00: Restoring backlight state May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: PM: suspend exit May 17 23:42:02 sh-kbl-nuc-hda-01 kernel: rfkill: input handler enabled May 17 23:42:02 sh-kbl-nuc-hda-01 /usr/libexec/gdm-x-session[677]: (II) systemd-logind: got pause for 13:66 May 17 23:42:02 sh-kbl-nuc-hda-01 /usr/libexec/gdm-x-session[677]: (II) systemd-logind: got pause for 13:65 ... May 17 23:42:03 sh-kbl-nuc-hda-01 kernel: sof-audio-pci-intel-skl 0000:00:1f.3: Debug PCIR: 00000000 at 00000044 May 17 23:42:03 sh-kbl-nuc-hda-01 kernel: sof-audio-pci-intel-skl 0000:00:1f.3: Debug PCIW: 00000010 at 00000044 May 17 23:42:03 sh-kbl-nuc-hda-01 kernel: sof-audio-pci-intel-skl 0000:00:1f.3: Turning i915 HDAC power 0 May 17 23:42:03 sh-kbl-nuc-hda-01 kernel: sof-audio-pci-intel-skl 0000:00:1f.3: Current DSP power state: D3 May 17 23:42:03 sh-kbl-nuc-hda-01 kernel: sof-audio-pci-intel-skl 0000:00:1f.3: fw_state change: 6 -> 0 May 17 23:42:04 sh-kbl-nuc-hda-01 kernel: e1000e 0000:00:1f.6 eno1: NIC Link is Up 100 Mbps Full Duplex, Flow Control: None May 17 23:42:04 sh-kbl-nuc-hda-01 kernel: e1000e 0000:00:1f.6 eno1: 10/100 speed: disabling TSO May 17 23:42:04 sh-kbl-nuc-hda-01 NetworkManager[432]: [1652830924.1466] device (eno1): carrier: link connected May 17 23:42:04 sh-kbl-nuc-hda-01 NetworkManager[432]: [1652830924.1471] device (eno1): ip:dhcp4: restarting ```
marc-hb commented 2 years ago

A look at timestamps in the output of journalctl --facility=kern -g '^ata.*failed|^ata.*link|^ata.*configured' | grep -C7 failed and of a couple, similar commands shows the SATA error on sh-kbl-nuc-hda-01 happens only at boot or resume time sometimes. So in theory it could be ignored... on this specific device. But then the filter would hide actual disk errors on other devices (and future, worse errors on this device too)

Could someone try to re-seat the SATA connector? Worth a try.

marc-hb commented 2 years ago

I think I know what could solve this relatively easily: the ability for verify-kernel-boot-log.sh to easily extend ignore_str without affecting other tests.

What would also be useful is some /etc/sof/ config file to exclude some errors on specific devices.

Finally, a combination of both would be ideal.

marc-hb commented 2 years ago

Speaking of ignoring hard drive issues...

MONTHS of wasted time.

marc-hb commented 2 months ago

Device not in use anymore