zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.86k stars 6.62k forks source link

deferred logging stops at some point for some time #78157

Open andreas-chmielewski-gcx opened 2 months ago

andreas-chmielewski-gcx commented 2 months ago

Describe the bug The application used generates a large amount of logs, especially at certain times (every few ms). At a certain time there is a behavior where the whole logging hangs for a certain time (sometimes 20 minutes, sometimes 2 hours) and nothing is output. It is very difficult to reproduce this behavior, but I have now managed to analyze it a bit more precisely with the debugger. The relevant file is "log_core.c". I was that the log process thread waits "FOREVER" for a semaphore.

line 913:
        if (log_process() == false) {
            if (processed_any) {
                processed_any = false;
                log_backend_notify_all(LOG_BACKEND_EVT_PROCESS_THREAD_DONE, NULL);
            }
            (void)k_sem_take(&log_process_thread_sem, timeout);

whereas cnt had a value of more than 1000.

line 171:
        if (CONFIG_LOG_PROCESS_TRIGGER_THRESHOLD == 1) {
            if (cnt == 0) {
                k_sem_give(&log_process_thread_sem);
            }
        } else {
            if (cnt == 0) {
                k_timer_start(&log_process_thread_timer,
                          K_MSEC(CONFIG_LOG_PROCESS_THREAD_SLEEP_MS),
                          K_NO_WAIT);
            } else if (CONFIG_LOG_PROCESS_TRIGGER_THRESHOLD &&
                   (cnt + 1) == CONFIG_LOG_PROCESS_TRIGGER_THRESHOLD) {
                k_timer_stop(&log_process_thread_timer);
                k_sem_give(&log_process_thread_sem);
            } else {
                /* No action needed. Message processing will be triggered by the
                 * timeout or when number of upcoming messages exceeds the
                 * threshold.
                 */
            }
        }

I manually modified cnt to the value of 10 with the debugger (the same as CONFIG_LOG_PROCESS_TRIGGER_THRESHOLD) and logging continued.

To Reproduce Not really easily possible but did not try to get the application small once it was clear that the logging system "stopped".

Expected behavior I at least would expect to see something like message dropped etc .... but not that the logging blocks totally for a long time.

Impact As the issue is not reproducible at all, it took me a long time to be actually on the pc while the bug occurred being able to attach the debugger to it. This issue is in particular evil as I had to think that the application freezes from time to time.

Logs and console output Not available

Environment:

zephyr-version: VERSION_MAJOR = 3 VERSION_MINOR = 5 PATCHLEVEL = 99 VERSION_TWEAK = 0 EXTRAVERSION =

prj.conf CONFIG_LOG=y CONFIG_LOG_MODE_DEFERRED=y CONFIG_LOG_SPEED=n CONFIG_LOG_BUFFER_SIZE=50000 CONFIG_LOG_PROCESS_THREAD_SLEEP_MS=10

github-actions[bot] commented 2 months ago

Hi @andreas-chmielewski-gcx! We appreciate you submitting your first issue for our open-source project. 🌟

Even though I'm a bot, I can assure you that the whole community is genuinely grateful for your time and effort. 🤖💙

mmahadevan108 commented 2 months ago

@andreas-chmielewski-gcx is this issue reproducible with the latest Zephyr upstream main branch.

andreas-chmielewski-gcx commented 1 month ago

I can not test this easily as we are not using the latest zephyr version but v3.5.99-ncs1-1 (using nrf sdk). I hope that the output of my debug session helps you finding that root cause

jussivastimo commented 4 weeks ago

I have same issue with v3.7.0-3885-g7773fe8dd60

Is it nesessary to check cnt value if CONFIG_LOG_PROCESS_TRIGGER_THRESHOLD == 1?