Closed fredoh9 closed 1 year ago
I really don't understand how this error appeared on June 15, we've been using this new device number allocation for a month or so now, and rt711 is used on pretty much all devices now.
@fredoh9 is this a daily test or a new device?
@bardliao FYI, in case you have a better understanding of the issue than me...
Yes, this is fairly new device. ADLP_SKU0B00_SDCA_IPC4ZPH added to daily test a month ago, There have been various issues that make it difficult to review the test results, test results have stabilized in the last week or two. This is not 100% reproducible, I'm checking reproduction rate for this.
There is different error today on the same device with check-kmod-load-unload-after-playback-5. Might be similar or related?
[ 1320.355601] kernel: sof_sdw sof_sdw: hda_dsp_hdmi_build_controls: no PCM in topology for HDMI converter 3
[ 1320.373678] kernel: input: sof-soundwire Headset Jack as /devices/pci0000:00/0000:00:1f.3/sof_sdw/sound/card0/input125
[ 1320.373859] kernel: input: sof-soundwire HDMI/DP,pcm=5 as /devices/pci0000:00/0000:00:1f.3/sof_sdw/sound/card0/input126
[ 1320.374011] kernel: input: sof-soundwire HDMI/DP,pcm=6 as /devices/pci0000:00/0000:00:1f.3/sof_sdw/sound/card0/input127
[ 1320.374157] kernel: input: sof-soundwire HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:1f.3/sof_sdw/sound/card0/input128
[ 1320.406555] kernel: SDW: Invalid device for paging :0
[ 1320.406568] kernel: SDW: Invalid device for paging :0
[ 1320.406622] kernel: SDW: Invalid device for paging :0
[ 1320.408157] kernel: SDW: Invalid device for paging :0
[ 1320.408195] kernel: SDW: Invalid device for paging :0
Intel internal daily test: planresultdetail/27832?model=ADLP_SKU0B00_SDCA_IPC4ZPH&testcase=check-kmod-load-unload-after-playback-5
@plbossart Is it possible that rt711_sdca_read_prop be called AFTER rt711 is attached? In this case, the dev_num will be assigned to 1 because slave->prop.wake_capable is 0. The slave->prop.wake_capable will be set to 1 when rt711_sdca_read_prop is called. However, we will reuse the dev_num. And generic_new_peripheral_assigned will report the invalid dev_num error.
[ 638.846859] kernel: soundwire_bus:sdw_extract_slave_id: soundwire sdw-master-0: SDW Slave class_id 0x01, mfg_id 0x025d, part_id 0x0711, unique_id 0x0, version 0x3
[ 638.846862] kernel: soundwire_bus:sdw_assign_device_num: soundwire sdw-master-0: Slave already registered, reusing dev_num:1
[ 638.846949] kernel: soundwire sdw-master-0: generic_new_peripheral_assigned: invalid dev_num 1
great comment @bardliao, yes indeed it's possible. If the driver probe is delayed for some reason (who knows what happens in userspace when udev tries to load the module), then it can indeed happen that the device is enumerated first. This would mean that the properties are not valid at that point.
It would be really easy to reproduce if we do a "blacklist snd-sof-rt711-sdca' and manually do the modprobe later.
oh man, yet another race condition.
@marc-hb for your records - another fantastic bug that CI identified
@keqiaozhang @fredoh9 can you confirm we haven't seen this issue since June 20, and if yes please close?
Confirmed that this issue cannot be reproduced in CI since June 28.
Basic playback test start failing from today in ADLP_SKU0B00_SDCA_IPC4ZPH.
TPLG=/lib/firmware/intel/avs-tplg/sof-adl-rt711-l0-rt1316-l12-rt714-l3.tplg MODEL=ADLP_SKU0B00_SDCA_IPC4ZPH ~/sof-test/test-case/check-playback.sh -d 100 -l 1 -r 1
Dmesg has this error,
Intel internal daily test link: planresultdetail/27731?model=ADLP_SKU0B00_SDCA_IPC4ZPH&testcase=check-playback-100sec
Environment
dmesg: ADLP_SKU0B00_SDCA-dmesg.txt