thesofproject / sof

Sound Open Firmware
Other
528 stars 307 forks source link

host_get_copy_bytes_normal: "no bytes to copy" log spam #8761

Closed marc-hb closed 1 month ago

marc-hb commented 7 months ago

1 line / ms

This mtrace spam still makes logs unusable.

Try opening https://sof-ci.01.org/sofpr/PR8754/build1979/devicetest/index.html?model=MTLP_RVP_NOCODEC&testcase=multiple-pause-resume-50 ,then click on the "mtrace" tab. The logs are so big that it takes at least 10-15 seconds to just open that tab. Probably longer if you don't have a high speed connection.

[ 2782.210465] <inf> pipe: pipeline_trigger: pipe:2 0x0 pipe trigger cmd 2
[ 2782.211160] <inf> ll_schedule: zephyr_ll_task_done: task complete 0xa0117cc0 0x20210U
[ 2782.211178] <inf> ll_schedule: zephyr_ll_task_done: num_tasks 1 total_num_tasks 3
[ 2782.211186] <inf> ll_schedule: zephyr_domain_unregister: zephyr_domain_unregister domain->type 1 domain->clk 3
[ 2782.221346] <inf> ipc: ipc_cmd: rx   : 0x13000004|0x1
[ 2782.221371] <inf> pipe: pipeline_trigger: pipe:2 0x0 pipe trigger cmd 8
[ 2782.221380] <inf> ll_schedule: zephyr_ll_task_schedule_common: task add 0xa0117cc0 0x20210U priority 0 flags 0x0
[ 2782.221398] <inf> ll_schedule: zephyr_domain_register: zephyr_domain_register domain->type 1 domain->clk 3 domain->ticks_per_ms 38400 period 1000
[ 2782.222170] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.223161] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.224160] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.225161] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.226160] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.227160] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.228160] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.229160] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.230160] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.231158] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.231463] <inf> pipe: pipeline_trigger: pipe:1 0x0 pipe trigger cmd 8
[ 2782.231488] <inf> ll_schedule: zephyr_ll_task_schedule_common: task add 0xa0118dc0 0x20210U priority 0 flags 0x0
[ 2782.232158] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.233161] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.234161] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.235161] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.236160] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.237160] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.238160] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.239161] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.240160] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.241158] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.241546] <inf> pipe: pipeline_trigger: pipe:0 0x0 pipe trigger cmd 8
[ 2782.241571] <inf> ll_schedule: zephyr_ll_task_schedule_common: task add 0xa0119240 0x20210U priority 0 flags 0x0
[ 2782.242231] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.242280] <inf> dma_intel_adsp_gpdma: intel_adsp_gpdma_power_off: intel_adsp_gpdma_power_off: dma dma@7c000 power off
[ 2782.242320] <inf> dma_intel_adsp_gpdma: intel_adsp_gpdma_power_on: intel_adsp_gpdma_power_on: dma dma@7c000 initialized
[ 2782.242363] <inf> dai_intel_dmic: dai_dmic_update_bits: dai_dmic_update_bits base 10000, reg 0, mask 6000000, value 4000000
[ 2782.242370] <inf> dai_intel_dmic: dai_dmic_update_bits: dai_dmic_update_bits base 10000, reg 1000, mask 10000, value 0

Originally posted by @marc-hb in https://github.com/thesofproject/sof/issues/4672#issuecomment-1897629245

cc:

kv2019i commented 7 months ago

This is potentially related to (and fixed by) https://github.com/thesofproject/sof/pull/8756

The "host_get_copy_bytes_norma" logs repeat exactly 10ms which is the delay we now have in set-pipeline-state processing (fixed by 8756).

marc-hb commented 7 months ago

Another large example: https://sof-ci.01.org/sofpr/PR8722/build2096/devicetest/index.html?model=MTLP_RVP_NOCODEC&testcase=multiple-pipeline-all

cc: @RanderWang

marc-hb commented 6 months ago

This massive spam really makes it impossible to do anthing. Try for instance looking at the mtrace logs in https://sof-ci.01.org/sofpr/PR8855/build2720/devicetest/index.html?model=MTLP_RVP_NOCODEC&testcase=multiple-pause-resume-50

Good luck.

The sheer workload is probably changing timings considerably too, which means turning debug on/off makes reproductions come and go.

lgirdwood commented 6 months ago

@marc-hb can you try the following patch in your failing test case. If it helps then we can upstream.

From aad766dc742525df67f579c0b5afadeb0b102244 Mon Sep 17 00:00:00 2001
From: Liam Girdwood <liam.r.girdwood@linux.intel.com>
Date: Mon, 19 Feb 2024 14:11:51 +0000
Subject: [PATCH] host-zephyr: flow control "no data to copy" message.

Flow control the important "no data to copy" message which can be
indicative that an error state has been entered but can also flood the
logs worsening the log SNR for debug.

Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>
---
 src/audio/copier/host_copier.h |  1 +
 src/audio/host-zephyr.c        | 16 +++++++++++++---
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/src/audio/copier/host_copier.h b/src/audio/copier/host_copier.h
index 08cbba43b..07346a868 100644
--- a/src/audio/copier/host_copier.h
+++ b/src/audio/copier/host_copier.h
@@ -68,6 +68,7 @@ struct host_data {
    uint16_t no_stream_position; /**< 1 means don't send stream position */
    uint64_t total_data_processed;
    uint8_t cont_update_posn; /**< 1 means continuous update stream position */
+   uint8_t dma_copy_info_count;    /**! flow control for DMA "no bytes" message */

    /* host component attributes */
    enum comp_copy_type copy_type;  /**< Current host copy type */
diff --git a/src/audio/host-zephyr.c b/src/audio/host-zephyr.c
index fd629838a..3abbba85e 100644
--- a/src/audio/host-zephyr.c
+++ b/src/audio/host-zephyr.c
@@ -397,9 +397,19 @@ static uint32_t host_get_copy_bytes_normal(struct host_data *hd, struct comp_dev
    if (!(hd->ipc_host.feature_mask & BIT(IPC4_COPIER_FAST_MODE)))
        dma_copy_bytes = MIN(hd->period_bytes, dma_copy_bytes);

-   if (!dma_copy_bytes)
-       comp_info(dev, "no bytes to copy, available samples: %d, free_samples: %d",
-             avail_samples, free_samples);
+   if (!dma_copy_bytes) {
+       /* flow control this message, its important but can flood the logs,
+        * so dial down to ~4 times per second when continual (based on
+        * 1 LL tick per ms and dma_copy_info_count being a uint8_t).
+        */
+       if (hd->dma_copy_info_count++ == 0) {
+           comp_info(dev, "no bytes to copy, available samples: %d, free_samples: %d",
+                 avail_samples, free_samples);
+       }
+   } else {
+       /* DMA back to normal now with data to copy so reset counter */
+       hd->dma_copy_info_count = 0;
+   }

    /* dma_copy_bytes should be aligned to minimum possible chunk of
     * data to be copied by dma.
-- 
2.40.1
kv2019i commented 6 months ago

This is potentially fixed by https://github.com/zephyrproject-rtos/zephyr/pull/69480 . Assigning to @serhiy-katsyuba-intel as owner of the Zephyr fix.

kv2019i commented 5 months ago

This should be fixed with #8954 , @marc-hb please close if you can confirm.

marc-hb commented 5 months ago

I will skim recent logs, give me some time until next week.

marc-hb commented 5 months ago

Not fixed. For instance the mtrace tab in https://sof-ci.01.org/sofpr/PR8995/build3690/devicetest/index.html?model=MTLP_RVP_NOCODEC&testcase=multiple-pause-resume-50 has still 30,000 lines with host_get_copy_bytes_normal ... and still crashes the browser.

That's for just 50 pause/resume iterations.

What's new: these lines are not consecutive anymore.

wszypelt commented 5 months ago

The original issue has been fixed, after talking to the developers, setting the issue to P2

marc-hb commented 4 months ago

I'm not sure why now, but https://sof-ci.01.org/sofpr/PR9086/build4356/devicetest/index.html?model=MTLP_RVP_NOCODEC&testcase=multiple-pause-resume-50 is still totally unusable (which is why I'm not sure why). Try it.

lrudyX commented 4 months ago

@marc-hb this won't be fixed in fw. Phrase "no bytes" was found about 33 000 times in 1000 000 log lines. Please close the ticket.

wszypelt commented 3 months ago

@marc-hb could you close the ticket? or maybe the issue is still reproducing?

marc-hb commented 3 months ago

Still 30,000 "no bytes to copy" spam in https://sof-ci.01.org/sofpr/PR9174/build4946/devicetest/index.html?model=LNLM_SDW_AIOC&testcase=multiple-pause-resume-50

There are 1200 such lines in 1.2 seconds near the end, so the 1 line/ms is probably not a coincidence.

dbaluta commented 3 months ago

@marc-hb can we implemented log_ratelimit? So that if thare are more that we only alow for about 10 similar logs in about 1s?

lgirdwood commented 2 months ago

@dbaluta pls go ahead with ratelimit. Thanks !

marc-hb commented 1 month ago

I don't know what changed but the logs are usable now: https://sof-ci.01.org/sofpr/PR9304/build6537/devicetest/index.html?model=MTLP_RVP_NOCODEC&testcase=check-pause-resume-capture-100

marc-hb commented 1 month ago

Actually, still a lot of repetition in for instance in:

But at least the logs are usable.

I don't know what changed

Maybe it was just me reducing the default count from 5 down to 1:

marc-hb commented 2 weeks ago

can we implemented log_ratelimit? So that if thare are more that we only alow for about 10 similar logs in about 1s?

Misclick sorry for the noise

marc-hb commented 2 weeks ago

can we implemented log_ratelimit? So that if thare are more that we only alow for about 10 similar logs in about 1s?

Someone did it??

https://sof-ci.01.org/sofpr/PR9319/build7075/devicetest/index.html?model=MTLP_RVP_NOCODEC&testcase=check-pause-resume-capture-100

[ 1321.674795] <inf> pipe: pipeline_trigger: pipe:1 0x0 pipe trigger cmd 2
[ 1321.675398] <inf> ll_schedule: zephyr_ll_task_done: task complete 0xa015a940 0xa00f905cU
[ 1321.675446] <inf> ll_schedule: zephyr_ll_task_done: num_tasks 1 total_num_tasks 1
[ 1321.675541] <inf> ll_schedule: zephyr_domain_unregister: zephyr_domain_unregister domain->type 1 domain->clk 3
[ 1322.730845] <inf> host_comp: host_get_copy_bytes_normal: comp:1 0x20004 no bytes to copy, available samples: 384, free_samples: 0
--- 27 messages dropped ---
[ 1322.735423] <inf> host_comp: host_get_copy_bytes_normal: comp:1 0x20004 no bytes to copy, available samples: 384, free_samples: 0
--- 12 messages dropped ---
[ 1322.739406] <inf> host_comp: host_get_copy_bytes_normal: comp:1 0x20004 no bytes to copy, available samples: 384, free_samples: 0
--- 18 messages dropped ---
[ 1322.747573] <inf> host_comp: host_get_copy_bytes_normal: comp:1 0x20004 no bytes to copy, available samples: 384, free_samples: 0
--- 24 messages dropped ---
[ 1322.755735] <inf> host_comp: host_get_copy_bytes_normal: comp:1 0x20004 no bytes to copy, available samples: 384, free_samples: 0
--- 21 messages dropped ---
[ 1322.766646] <inf> host_comp: host_get_copy_bytes_normal: comp:1 0x20004 no bytes to copy, available samples: 384, free_samples: 0
--- 17 messages dropped ---
[ 1322.771691] <wrn> dai_comp: dai_common_copy: comp:0 0x4 dai_common_copy(): Copy_bytes 0 + free bytes 352 < period bytes 768, possible glitch
--- 15 messages dropped ---
[ 1322.779690] <wrn> dai_comp: dai_common_copy: comp:0 0x4 dai_common_copy(): Copy_bytes 0 + free bytes 352 < period bytes 768, possible glitch
--- 19 messages dropped ---
[ 1322.790786] <wrn> dai_comp: dai_common_copy: comp:0 0x4 dai_zephyr_copy(): nothing to copy
--- 16 messages dropped ---