thesofproject / sof-test

BSD 3-Clause "New" or "Revised" License
12 stars 44 forks source link

DEBUG_LOCKS_WARN_ON not detected -> PASS status #478

Open plbossart opened 3 years ago

plbossart commented 3 years ago

in Intel Daily test 627, the verify-kernel-module-load-probe passes for TGL_UNB_SDW12

Test Case URL: https://github.com/thesofproject/sof-test/tree/master/test-case/verify-kernel-module-load-probe.sh
cmd: TPLG=sof-tgl-rt711-rt1316-rt714.tplg ~/sof-test/test-case/verify-kernel-module-load-probe.sh
====================
framework load: ssh  ubuntu@sh-tgl-unb-sdw-01.sh.intel.com  'nohup bash -c "TPLG=sof-tgl-rt711-rt1316-rt714.tplg ~/sof-test/test-case/verify-kernel-module-load-probe.sh 2>&1 | nc -N 10.239.156.45 34961" >/dev/null &'
====================
framework run: 'netcat -l -k 34961'
====================
2020-10-29 17:46:40 UTC [REMOTE_INFO] Checking if sof relative modules loaded
2020-10-29 17:46:40 UTC [REMOTE_COMMAND] lsmod | grep "sof"
snd_soc_sof_sdw        57344  0
snd_soc_hdac_hdmi      40960  1 snd_soc_sof_sdw
snd_sof_pci            24576  0
snd_sof_intel_hda_common    98304  1 snd_sof_pci
snd_soc_hdac_hda       24576  1 snd_sof_intel_hda_common
snd_sof_intel_hda      20480  1 snd_sof_intel_hda_common
snd_sof_intel_byt      28672  1 snd_sof_pci
snd_soc_acpi_intel_match    45056  2 snd_sof_pci,snd_sof_intel_hda_common
snd_sof_intel_ipc      20480  1 snd_sof_intel_byt
snd_sof               147456  4 snd_sof_pci,snd_sof_intel_hda_common,snd_sof_intel_byt,snd_sof_intel_ipc
snd_sof_xtensa_dsp     16384  2 snd_sof_intel_hda_common,snd_sof_intel_byt
snd_soc_acpi           16384  3 snd_soc_acpi_intel_match,snd_sof_intel_hda_common,snd_sof_intel_byt
snd_hda_ext_core       32768  4 snd_sof_intel_hda_common,snd_soc_hdac_hdmi,snd_soc_hdac_hda,snd_sof_intel_hda
snd_intel_dspcfg       24576  2 snd_sof_pci,snd_sof_intel_hda_common
soundwire_intel        45056  6 snd_sof_intel_hda_common,snd_intel_dspcfg
snd_soc_core          299008  10 snd_soc_sof_sdw,snd_soc_rt715_sdca,snd_soc_rt1316_sdw,soundwire_intel,snd_sof,snd_sof_intel_hda_common,snd_soc_hdac_hdmi,snd_soc_hdac_hda,snd_soc_rt711_sdca,snd_soc_dmic
soundwire_bus          94208  9 snd_soc_sof_sdw,snd_soc_rt715_sdca,regmap_sdw,snd_soc_rt1316_sdw,regmap_sdw_mbq,soundwire_intel,snd_soc_rt711_sdca,soundwire_generic_allocation,soundwire_cadence
snd_hda_codec         167936  3 snd_soc_sof_sdw,snd_hda_codec_hdmi,snd_soc_hdac_hda
snd_hda_core          106496  8 snd_soc_sof_sdw,snd_hda_codec_hdmi,snd_hda_ext_core,snd_hda_codec,snd_sof_intel_hda_common,snd_soc_hdac_hdmi,snd_soc_hdac_hda,snd_sof_intel_hda
snd_pcm               139264  12 snd_soc_rt715_sdca,snd_hda_codec_hdmi,snd_soc_rt1316_sdw,snd_hda_codec,soundwire_intel,snd_sof,snd_sof_intel_hda_common,snd_soc_hdac_hdmi,snd_sof_intel_ipc,snd_soc_rt711_sdca,snd_soc_core,snd_hda_core
snd                    98304  11 snd_soc_sof_sdw,snd_seq,snd_seq_device,snd_hda_codec_hdmi,snd_hwdep,snd_hda_codec,snd_timer,snd_soc_hdac_hdmi,snd_soc_core,snd_pcm,snd_rawmidi
ledtrig_audio          16384  2 snd_sof,dell_laptop
2020-10-29 17:46:40 UTC [REMOTE_INFO] Test Result: PASS!

However the dmesg contains a big deplock WARN tracked here: https://github.com/thesofproject/linux/issues/2542

The question is: shouldn't sof-test fail this case?

cc:

marc-hb commented 3 years ago

In theory Call Trace should be caught, see the last lines of sof-test/tools/sof-kernel-log-check.sh

marc-hb commented 3 years ago

Wait: this is a boot time error, correct? We don't catch boot time errors. This has been discussed in internal issue 466

kernel: [    4.078863] BUG: key ffff8a9203c92dc0 has not been registered!
kernel: [    4.078865] ------------[ cut here ]------------
kernel: [    4.078866] DEBUG_LOCKS_WARN_ON(1)
...
kernel: [    4.078902] Call Trace:
kernel: [    4.078906]  __kernfs_create_file+0x76/0x100
plbossart commented 3 years ago

@marc-hb this happens when the snd-sof-pci driver is probed. This can happen at boot time if there's no blacklist.

If you don't catch such errors at boot time, then the sof-ci framework is broken somehow. If we do NOT detect errors on boot and do NOT prevent probe on boot, then we have a huge gap in our test coverage.

marc-hb commented 3 years ago

We don't catch boot time errors.

@xiulipan says "it may not that simple, may depend on the test". @xiulipan to investigate.

IMHO it's OK if some tests catch boot time errors, for instance it's OK if tests after the first one in a suite don't look at boot time errors.

xiulipan commented 3 years ago

@plbossart @marc-hb After checking our old logs, we could got boot time call trace but we did not enable kernel log check in verify test cases. I will send a RFC to enable kernel log check in one of our verify test script.

marc-hb commented 3 years ago

There's been a lot of journalctl changes, so maybe this has been fixed.

Next step: inject a fake error and make sure this test (and others...) fail.

We have fake_kern_error() in sof-test but not sure it's similar enough to this DEBUG_LOCKS_WARN_ON