thesofproject / sof

Sound Open Firmware
Other
541 stars 309 forks source link

[BUG][MTL][ChromeOS][Rex] - alsa conformance test rate error exceeds the limit for DMIC with 4 channel #8458

Open udaymb opened 11 months ago

udaymb commented 11 months ago

Describe the bug alsa conformance test is executed with 4 channel DMIC. Issue is not observed with this test on I2S speaker or Headset.

More information on alsa conformance test command used for verification on Rex chrome system.

alsa_conformance_test -C 'hw:0,99' -c 4 -f S32_LE --debug output - act-dmic-rate-error-sku3-15668-23ww451.log

Result summary number of recorders: 1 number of points: 1277 step average: 37.467085 step min: 4 step max: 52 step median: 48 step standard deviation: 17.585385 rate: 47998.531900 rate error: 17.757167 zero channels: 0 0 0 0 number of underrun: 0 number of overrun: 0

mtrace showed this messages --- 12 messages dropped --- [ 1.112486] pipe: pipeline_new: pipeline new pipe_id 0 priority 0 [ 1.113965] ipc: ipc_cmd: rx : 0x40000004|0x306 [ 1.114300] dai_intel_dmic: dai_dmic_probe: dmic_probe() [ 1.114396] dma: dma_get: dma_get() ID 0 sref = 1 busy channels 0 [ 1.114428] ipc: dai_config: comp:0 0x40000 dai_config() dai type = 2 index = 0 dd 0x40105e00 --- 12 messages dropped --- [ 1.114806] dai_intel_dmic_nhlt: dai_dmic_set_config_nhlt: dmic_set_config_nhlt(): illegal CIC_CONTROL = 0x0000c001 [ 1.114840] dai_intel_dmic_nhlt: dai_dmic_set_config_nhlt: dmic_set_config_nhlt(): clkdiv = 16, mcic = 25, mfir_a = 2, len = 101

Above result shows that rate error exceeded the limit of 10. Hence the test is considered failed.

To Reproduce Run below command on MTL Rex Chrome system that supports 4 channel dmic alsa_conformance_test -C 'hw:0,99' -c 4 -f S32_LE --debug

Reproduction Rate How often does the issue happen ? i.e. 1/10 (once in ten attempts), 1/1000 or all the time. Does the reproduction rate vary with any other configuration or user action, if so please describe and show the new reproduction rate. Always reproducible

Expected behavior A clear and concise description of what you expected to happen. rate error should not exceed 10

Impact What impact does this issue have on your progress (e.g., annoyance, showstopper) Performance test failure is blocking the

Environment 1) Branch name and commit hash of the 2 repositories: sof (firmware/topology) and linux (kernel driver).

Screenshots or console output If applicable, add a screenshot (drag-and-drop an image), or console logs (cut-and-paste text and put a code fence (```) before and after, to help explain the issue.

Please also include the relevant sections from the firmware log and kernel log in the report (and attach the full logs for complete reference). Kernel log is taken from dmesg and firmware log from sof-logger. See https://thesofproject.github.io/latest/developer_guides/debugability/logger/index.html

mtrace log and with dynamic logs and ipc4 messaging enabled kernel logs mtrace-act-dmic-rate-error-sku1-23ww451.log dmesg-act-dmic-rate-error-sku1-23ww451.log

abonislawski commented 11 months ago

[ 1.114806] dai_intel_dmic_nhlt: dai_dmic_set_config_nhlt: dmic_set_config_nhlt(): illegal CIC_CONTROL = 0x0000c001

this means FW received CIC_CONTROL_STEREO_MODE bit in nhlt blob which is valid only for CAVS, so (at least partially) blob is invalid for ACE

@kv2019i @ranj063 @singalsu do you know why we have this in ace blob and if the rest of values are correct?

@singalsu I think we had similar issue in the past with alsa rate test, do you remember how it ended?

singalsu commented 11 months ago

I recall it was a limitation in alsa_conformance_test, the buffers sizes with that capture PCM caused too much time measurement noise. @kv2019i Do you remember? I recall you knew more about why it happens.

singalsu commented 11 months ago

For the CIC_CONTROL setting, The HW version for DMIC blob generate can be wrong. It should be for ACE. Need to check the topology.

lgirdwood commented 10 months ago

@udaymb IIUC, this test is failing becuase the DMA position is slightly out of sync wrt where the host thinks it should be ? i.e. the rate calculation is slightly lower than 48000 below.

---------PRINT PARAMS---------
PCM name: hw:0,99
card: sofrt5682 [sof-rt5682]
device: DMIC Raw (*) []
stream: CAPTURE
merge_threshold_t: 0.000100
merge_threshold_sz: 36
access type: MMAP_INTERLEAVED
format: S32_LE
channels: 4
rate: 48000 fps
period time: 5333 us
period size: 256 frames
buffer time: 85333 us
buffer size: 4096 frames
---------TIMER RESULT---------
                                 Total_time(s)               Counts          Averages(s)
snd_pcm_open                       0.080251119                    2             0.040126
snd_pcm_hw_params                  0.047723671                    2             0.023862
snd_pcm_hw_params_any              0.000027943                    2             0.000014
snd_pcm_sw_params                  0.000005414                    2             0.000003
snd_pcm_prepare                    0.000021444                    2             0.000011
snd_pcm_start                      0.034489308                    2             0.017245
snd_pcm_avail                      1.983008395               137887             0.000014
precision: 0.000000001
----------RUN RESULT----------
number of recorders: 1
number of points: 1277
step average: 37.467085
step min: 4
step max: 52
step median: 48
step standard deviation: 17.585385
rate: 47998.531900
rate error: 17.757167
zero channels: 0 0 0 0
number of underrun: 0
number of overrun: 0

I've just discussed with @kv2019i and the DMA copies will not always be synchronous wrt to system time i.e. DMA will try and save power by opportunistically copying when the bus is already active.

kv2019i commented 10 months ago

@lgirdwood @singalsu @udaymb This is potentially same issue as https://github.com/thesofproject/sof/issues/4617 The avail updates can be bursty as DMA operations are delayed to save power. The average rate should still be correct, but in the older bug, the ALSA conformance check considered the burstiness to be too much.

lgirdwood commented 10 months ago

Adding @cujomalainey and @johnylin76

cujomalainey commented 10 months ago

This test also seems to be very touchy if the DSP pipeline period is >1ms. Given we have no way to advertise this to the test to adjust the test accordingly it always fails at those higher periods.

johnylin76 commented 10 months ago

A known fact on our end is that the rate error should be alleviated by adjusting merge_threshold, e.g. --merge_threshold 0.001 --merge_threshold_sz 500. The major factor is the pipeline period. I think the same issue has been raised in #6032

kv2019i commented 10 months ago

@cujomalainey @johnylin76 Ack, we have timing jitter from multiple sources in DSP: 1) the exact moment FW pipeline provides data to DMA and marks it ready for sending (this can vary with larger period sizes in DSP, and/or with larger host-side buffer used with 1ms timer tick), 2) the HW has control over when a memory transaction actually happens and can delay actions provided it stays in the limits set by the DSP (this timing jitter not only depends on the audio DSP but the rest of the system)

Both of these are beneficial to save power and based on ALSA interface, are legal in terms of the PCM interface.

I agree it's a problem we can't report this in a standard way. Currently it seems the period-size is only matching existing property and this driver need to honor. E.g in this case, the hw_ptr updates must happen within period size of 256 frames, and that we do meet (but not very useful for conformance test).

FYI @udaymb