thesofproject / linux

Linux kernel source tree
Other
91 stars 133 forks source link

SoundWire/SDCA: additional experiments with Functions #5010

Closed plbossart closed 4 months ago

plbossart commented 6 months ago

Tentative branch to show how simple it'd be to create subdevices for each SDCA function, using the auxiliary bus.

Tested on a MTL device with 2x SmartAmp, UAJ and HID with the following options added:

CONFIG_SND_SOC_SDCA=m
CONFIG_SND_SOC_SDCA_SMART_AMP=m
CONFIG_SND_SOC_SDCA_UAJ=m
CONFIG_SND_SOC_SDCA_SMART_MIC=m
CONFIG_SND_SOC_SDCA_HID=m

The code gets to a probe for all the drivers:

[   37.802023] acpi device:25: patch_sdca_function_type: found SDCA function UAJ (type 6)
[   37.802031] acpi device:26: patch_sdca_function_type: found SDCA function HID (type 10)
[   37.802347] acpi device:28: patch_sdca_function_type: found SDCA function SmartAmp (type 1)
[   37.802719] acpi device:2a: patch_sdca_function_type: found SDCA function SmartAmp (type 1)

[   37.812859] snd_soc_sdca_smart_amp snd_soc_sdca.SmartAmp.0: sdca_smart_amp_probe: done
[   37.812913] snd_soc_sdca_smart_amp snd_soc_sdca.SmartAmp.1: sdca_smart_amp_probe: done
[   37.819252] snd_soc_sdca_uaj snd_soc_sdca.UAJ.2: sdca_uaj_probe: done
[   37.819265] snd_soc_sdca_hid snd_soc_sdca.HID.3: sdca_hid_probe: done
root@fedora:~# ls -rt /sys/bus/auxiliary/devices/
snd_soc_sdca.SmartAmp.1  snd_soc_sdca.UAJ.2  soundwire_intel.link.2  soundwire_intel.link.0  snd_sof.hda-probes.0
snd_soc_sdca.SmartAmp.0  snd_soc_sdca.HID.3  soundwire_intel.link.1  snd_sof.msg_injector.0
root@fedora:~# ls -rt /sys/bus/soundwire/devices/sdw:*025d*/*sdca*
'/sys/bus/soundwire/devices/sdw:0:0:025d:0713:01/snd_soc_sdca.UAJ.2':
subsystem  driver  uevent  power

'/sys/bus/soundwire/devices/sdw:0:0:025d:0713:01/snd_soc_sdca.HID.3':
subsystem  driver  uevent  power

'/sys/bus/soundwire/devices/sdw:0:1:025d:1316:01/snd_soc_sdca.SmartAmp.0':
subsystem  driver  uevent  power

'/sys/bus/soundwire/devices/sdw:0:2:025d:1316:01/snd_soc_sdca.SmartAmp.1':
subsystem  driver  uevent  power

What we are missing is a callback to tell the functions that the init_io is complete on attachment. suspend-resume can be added as well.

@bardliao @ujfalusi @kv2019i let me know what you think.

marc-hb commented 6 months ago

Commit https://github.com/thesofproject/linux/pull/5010/commits/2772fe4c59b3ee25ceb4c5788b25a9664223b3d5 (merged into https://github.com/thesofproject/linux/commit/e7e81a5e790e) crashed ba-lnlm-rvp-sdw-03 really bad and made it unavailable for CI for at least 7 hours (from Wed 2024-05-22 21:15:35 UTC to Thu 2024-05-23 03:18:11 UTC) because that devices (and other in BA) don't have remote power control yet and can't be power-cycled by CI.

Can you please avoid too "risky" pull requests until all devices are fully equipped?

cc: @ssavati

May 22 21:01:16 ba-lnlm-rvp-sdw-03 kernel: soundwire_bus:sdw_slave_set_frequency: rt711-sdca sdw:0:0:025d:0711:01: Configured bus base 1, scale 3, mclk 19200000, curr_freq 4800000
May 22 21:01:16 ba-lnlm-rvp-sdw-03 kernel: soundwire_bus:sdw_slave_set_frequency: rt715-sdca sdw:0:1:025d:0714:01: Configured bus base 1, scale 3, mclk 19200000, curr_freq 4800000
May 22 21:01:16 ba-lnlm-rvp-sdw-03 kernel: soundwire_bus:sdw_handle_slave_status: rt715-sdca sdw:0:1:025d:0714:01: signaling initialization completion for Slave 7
May 22 21:01:16 ba-lnlm-rvp-sdw-03 systemd[1]: Stopped target Sound Card.
May 22 21:01:16 ba-lnlm-rvp-sdw-03 acpid[382]: input device has been disconnected, fd 10
May 22 21:01:16 ba-lnlm-rvp-sdw-03 systemd[897]: Stopped target Sound Card.
May 22 21:01:16 ba-lnlm-rvp-sdw-03 kernel: snd_soc_rt1316_sdw:rt1316_io_init: rt1316-sdca sdw:0:3:025d:1316:01: rt1316_io_init hw_init complete
May 22 21:01:16 ba-lnlm-rvp-sdw-03 kernel: soundwire_bus:sdw_handle_slave_status: rt1316-sdca sdw:0:3:025d:1316:01: signaling initialization completion for Slave 1
May 22 21:01:16 ba-lnlm-rvp-sdw-03 kernel: snd_soc_rt1316_sdw:rt1316_io_init: rt1316-sdca sdw:0:2:025d:1316:01: rt1316_io_init hw_init complete
May 22 21:01:16 ba-lnlm-rvp-sdw-03 kernel: soundwire_bus:sdw_handle_slave_status: rt1316-sdca sdw:0:2:025d:1316:01: signaling initialization completion for Slave 1
May 22 21:01:16 ba-lnlm-rvp-sdw-03 kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc tx      : 0x44000000|0x3060004c: MOD_LARGE_CONFIG_SET [data size: 76]
May 22 21:01:16 ba-lnlm-rvp-sdw-03 kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc tx reply: 0x64000000|0x3060004c: MOD_LARGE_CONFIG_SET
May 22 21:01:16 ba-lnlm-rvp-sdw-03 kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc tx done : 0x44000000|0x3060004c: MOD_LARGE_CONFIG_SET [data size: 76]
May 22 21:01:16 ba-lnlm-rvp-sdw-03 kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc tx      : 0x47000000|0x0: MOD_SET_DX [data size: 8]
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc tx reply: 0x67000000|0x0: MOD_SET_DX
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc tx done : 0x47000000|0x0: MOD_SET_DX [data size: 8]
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel: snd_soc_rt711_sdca:rt711_sdca_calibration: rt711-sdca sdw:0:0:025d:0711:01: rt711_sdca_calibration calibration complete, ret=0
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel: snd_soc_rt711_sdca:rt711_sdca_jack_init: rt711-sdca sdw:0:0:025d:0711:01: in rt711_sdca_jack_init enable
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel: snd_soc_rt711_sdca:rt711_sdca_io_init: rt711-sdca sdw:0:0:025d:0711:01: rt711_sdca_io_init hw_init complete
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel: soundwire_bus:sdw_handle_slave_status: rt711-sdca sdw:0:0:025d:0711:01: signaling initialization completion for Slave 6
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel: rt1316-sdca sdw:0:2:025d:1316:01: rt1316_sdw_remove: plb: before unregister_functions
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel: BUG: kernel NULL pointer dereference, address: 0000000000000040
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel: #PF: supervisor read access in kernel mode
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel: #PF: error_code(0x0000) - not-present page
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel: PGD 0 P4D 0
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel: CPU: 2 PID: 20385 Comm: rmmod Not tainted 6.9.0-rc5-ge7e81a5e790e #pr5010
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel: Hardware name: Intel Corporation Lunar Lake Client Platform/LNL-M LP5 RVP1, BIOS LNLMFWI1.R00.2470.D84.2312061937 12/06/2023
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel: RIP: 0010:device_del+0x23/0x3d0
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel: Code: 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 41 57 31 f6 41 56 41 55 4c 8d af 80 00 00 00 41 54 55 48 89 fd 53 48 83 ec 20 <4c> 8b 67 40 4c 89 ef 65 48 8b 04 25 28 00 00 00 48 89 44 24 18 31
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel: RSP: 0018:ffff9729410b7ac8 EFLAGS: 00010282
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel: RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel: RBP: 0000000000000000 R08: ffffffffa835e638 R09: 0000000100035a29
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel: R10: ffffffffa8287260 R11: ffffffffa835e638 R12: ffff91ae45837000
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel: R13: 0000000000000080 R14: 0000000000000080 R15: 0000000000000000
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel: FS:  00007f4979861c40(0000) GS:ffff91b5a0c00000(0000) knlGS:0000000000000000
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel: CR2: 0000000000000040 CR3: 0000000116252006 CR4: 0000000000f70ef0
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel: DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel: PKRU: 55555554
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel: Call Trace:
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel:  <TASK>
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel:  ? __die+0x24/0x70
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel:  ? page_fault_oops+0x15a/0x450
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel:  ? irq_work_queue+0x32/0x60
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel:  ? vprintk_emit+0x103/0x2f0
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel:  ? local_clock+0x15/0x30
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel:  ? exc_page_fault+0x68/0x1e0
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel:  ? asm_exc_page_fault+0x26/0x30
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel:  ? device_del+0x23/0x3d0
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel:  ? _dev_info+0x70/0x90
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel:  sdca_dev_unregister_functions+0x36/0x60 [snd_soc_sdca]
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel:  rt1316_sdw_remove+0x3e/0x60 [snd_soc_rt1316_sdw]
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel:  sdw_drv_remove+0x54/0x80 [soundwire_bus]
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel:  device_release_driver_internal+0x1a1/0x210
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel:  bus_remove_device+0xd4/0x140
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel:  device_del+0x151/0x3d0
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel:  ? __mutex_unlock_slowpath+0x44/0x290
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel:  device_unregister+0x17/0x60
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel:  sdw_delete_slave+0xad/0xc0 [soundwire_bus]
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel:  ? __pfx_sdw_delete_slave+0x10/0x10 [soundwire_bus]
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel:  device_for_each_child+0x59/0xa0
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel:  sdw_bus_master_delete+0x1e/0x60 [soundwire_bus]
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel:  auxiliary_bus_remove+0x1c/0x30
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel:  device_release_driver_internal+0x1a1/0x210
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel:  bus_remove_device+0xd4/0x140
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel:  device_del+0x151/0x3d0
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel:  sdw_intel_exit+0x89/0x110 [soundwire_intel]
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel:  hda_dsp_remove+0x72/0x180 [snd_sof_intel_hda_generic]
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel:  snd_sof_device_remove+0x119/0x180 [snd_sof]
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel:  sof_pci_remove+0x1e/0x50 [snd_sof_pci]
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel:  pci_device_remove+0x3f/0xb0
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel:  device_release_driver_internal+0x1a1/0x210
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel:  driver_detach+0x4b/0x90
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel:  bus_remove_driver+0x70/0xf0
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel:  pci_unregister_driver+0x3f/0x90
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel:  __do_sys_delete_module+0x1a5/0x2a0
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel:  ? __x64_sys_close+0x3c/0x80
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel:  do_syscall_64+0xa8/0x1b0
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel:  entry_SYSCALL_64_after_hwframe+0x77/0x7f
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel: RIP: 0033:0x7f4979989aeb
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel: Code: 73 01 c3 48 8b 0d 45 33 0f 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 15 33 0f 00 f7 d8 64 89 01 48
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel: RSP: 002b:00007ffead476c78 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
May 22 21:01:17 ba-lnlm-rvp-sdw-03 kernel: RAX: ffffffffffffffda RBX: 00005565dbe3e780 RCX: 00007f4979989aeb
plbossart commented 6 months ago

Can you please avoid too "risky" pull requests until all devices are fully equipped?

You would one know if a PR is risky or not?

Don't put a device in CI if it's not fully supported by the CI infrastructure...

marc-hb commented 6 months ago

You would one know if a PR is risky or not?

... when it says "experiments" in the name and causes "kernel NULL pointer dereferences", which I believe are fairly rare.

Don't put a device in CI if it's not fully supported by the CI infrastructure...

It's better to have a device that must be manually rebooted from time to time compared to not having it at all. Manually rebooting it is a small (and temporary) price to pay for significantly more bandwidth. We're running really short in some areas.

plbossart commented 6 months ago

You would one know if a PR is risky or not?

... when it says "experiments" in the name and causes "kernel NULL pointer dereferences", which I believe are fairly rare.

Race conditions happen. I couldn't reproduce the problem on my own device with the scripts until I found a simpler recipe that shows it 100% of the time.

Don't put a device in CI if it's not fully supported by the CI infrastructure...

It's better to have a device that must be manually rebooted from time to time compared to not having it at all. Manually rebooting it is a small (and temporary) price to pay for significantly more bandwidth. We're running really short in some areas.

Those devices pay for themselves given the average engineer hourly rate.

plbossart commented 6 months ago

For the record, here's the race condition fix

diff --git a/drivers/soundwire/slave.c b/drivers/soundwire/slave.c
index 4e886a37910a..f5ecae748106 100644
--- a/drivers/soundwire/slave.c
+++ b/drivers/soundwire/slave.c
@@ -71,6 +71,14 @@ int sdw_slave_add(struct sdw_bus *bus,
        list_add_tail(&slave->node, &bus->slaves);
        mutex_unlock(&bus->bus_lock);

+       /*
+        * SDCA basic properties are used in the SoundWire driver
+        * probe. To avoid any race condition, they must be updated before
+        * device_register(), which will trigger the driver probe.
+        */
+       sdca_lookup_interface_revision(slave);
+       sdca_lookup_functions(slave);
+
        ret = device_register(&slave->dev);
        if (ret) {
                dev_err(bus->dev, "Failed to add slave: ret %d\n", ret);
@@ -88,9 +96,6 @@ int sdw_slave_add(struct sdw_bus *bus,
        }
        sdw_slave_debugfs_init(slave);

-       sdca_lookup_interface_revision(slave);
-       sdca_lookup_functions(slave);
-
        return ret;
 }
 EXPORT_SYMBOL(sdw_slave_add);

in some cases, sdca_lookup_functions() was called AFTER the codec probe, which caused a kernel oops on remove.

marc-hb commented 6 months ago

Those devices pay for themselves given the average engineer hourly rate.

Come on, you know it's never that simple.

Anyway I think we keeping adding more hardware right now.

plbossart commented 4 months ago

closing, will re-add a new branch