thesofproject / linux

Linux kernel source tree
Other
91 stars 133 forks source link

[BUG] Soundwire error generic_new_peripheral_assigned: invalid dev_num 1 on ADLP SDCA #4426

Closed fredoh9 closed 1 year ago

fredoh9 commented 1 year ago

Basic playback test start failing from today in ADLP_SKU0B00_SDCA_IPC4ZPH.

TPLG=/lib/firmware/intel/avs-tplg/sof-adl-rt711-l0-rt1316-l12-rt714-l3.tplg MODEL=ADLP_SKU0B00_SDCA_IPC4ZPH ~/sof-test/test-case/check-playback.sh -d 100 -l 1 -r 1

Dmesg has this error,

[  638.846949] kernel: soundwire sdw-master-0: generic_new_peripheral_assigned: invalid dev_num 1

Intel internal daily test link: planresultdetail/27731?model=ADLP_SKU0B00_SDCA_IPC4ZPH&testcase=check-playback-100sec

Environment

dmesg: ADLP_SKU0B00_SDCA-dmesg.txt

plbossart commented 1 year ago

I really don't understand how this error appeared on June 15, we've been using this new device number allocation for a month or so now, and rt711 is used on pretty much all devices now.

@fredoh9 is this a daily test or a new device?

@bardliao FYI, in case you have a better understanding of the issue than me...

fredoh9 commented 1 year ago

Yes, this is fairly new device. ADLP_SKU0B00_SDCA_IPC4ZPH added to daily test a month ago, There have been various issues that make it difficult to review the test results, test results have stabilized in the last week or two. This is not 100% reproducible, I'm checking reproduction rate for this.

fredoh9 commented 1 year ago

There is different error today on the same device with check-kmod-load-unload-after-playback-5. Might be similar or related?

[ 1320.355601] kernel: sof_sdw sof_sdw: hda_dsp_hdmi_build_controls: no PCM in topology for HDMI converter 3
[ 1320.373678] kernel: input: sof-soundwire Headset Jack as /devices/pci0000:00/0000:00:1f.3/sof_sdw/sound/card0/input125
[ 1320.373859] kernel: input: sof-soundwire HDMI/DP,pcm=5 as /devices/pci0000:00/0000:00:1f.3/sof_sdw/sound/card0/input126
[ 1320.374011] kernel: input: sof-soundwire HDMI/DP,pcm=6 as /devices/pci0000:00/0000:00:1f.3/sof_sdw/sound/card0/input127
[ 1320.374157] kernel: input: sof-soundwire HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:1f.3/sof_sdw/sound/card0/input128
[ 1320.406555] kernel: SDW: Invalid device for paging :0
[ 1320.406568] kernel: SDW: Invalid device for paging :0
[ 1320.406622] kernel: SDW: Invalid device for paging :0
[ 1320.408157] kernel: SDW: Invalid device for paging :0
[ 1320.408195] kernel: SDW: Invalid device for paging :0

Intel internal daily test: planresultdetail/27832?model=ADLP_SKU0B00_SDCA_IPC4ZPH&testcase=check-kmod-load-unload-after-playback-5

bardliao commented 1 year ago

@plbossart Is it possible that rt711_sdca_read_prop be called AFTER rt711 is attached? In this case, the dev_num will be assigned to 1 because slave->prop.wake_capable is 0. The slave->prop.wake_capable will be set to 1 when rt711_sdca_read_prop is called. However, we will reuse the dev_num. And generic_new_peripheral_assigned will report the invalid dev_num error.

[  638.846859] kernel: soundwire_bus:sdw_extract_slave_id: soundwire sdw-master-0: SDW Slave class_id 0x01, mfg_id 0x025d, part_id 0x0711, unique_id 0x0, version 0x3
[  638.846862] kernel: soundwire_bus:sdw_assign_device_num: soundwire sdw-master-0: Slave already registered, reusing dev_num:1
[  638.846949] kernel: soundwire sdw-master-0: generic_new_peripheral_assigned: invalid dev_num 1
plbossart commented 1 year ago

great comment @bardliao, yes indeed it's possible. If the driver probe is delayed for some reason (who knows what happens in userspace when udev tries to load the module), then it can indeed happen that the device is enumerated first. This would mean that the properties are not valid at that point.

It would be really easy to reproduce if we do a "blacklist snd-sof-rt711-sdca' and manually do the modprobe later.

oh man, yet another race condition.

@marc-hb for your records - another fantastic bug that CI identified

plbossart commented 1 year ago

@keqiaozhang @fredoh9 can you confirm we haven't seen this issue since June 20, and if yes please close?

keqiaozhang commented 1 year ago

Confirmed that this issue cannot be reproduced in CI since June 28.