raspberrypi / pico-sdk

BSD 3-Clause "New" or "Revised" License
3.55k stars 881 forks source link

printf deadlock usb using two tasks #1453

Closed adri-sass closed 1 week ago

adri-sass commented 1 year ago

Hello, I've a deadlock using the printf in two differents thread with the usb stdio configured. I'm using:

A simple main that causes the issue is the following:

#include <stdio.h>
#include "pico/stdlib.h"
#include "FreeRTOS-Kernel/include/FreeRTOS.h"
#include "FreeRTOS-Kernel/include/task.h"

void TaskA(void*)
{
    int VAL = 0;   
    while(true)
    {
        printf("TaskA VAL = %d\n", VAL++);
    }
}

void TaskB(void*)
{
    int VAL = 0;
    while(true)
    {
        printf("TaskB VAL = %d\n", VAL++);
    }
}

int main() {
    stdio_init_all();
    sleep_ms(1000);
    printf("STARTING\n");
    #ifdef LIB_PICO_MULTICORE
        printf("LIB_PICO_MULTICORE defined\n");
    #else
        printf("LIB_PICO_MULTICORE not defined\n");
    #endif
    xTaskCreate(TaskA, "Bouncing Task1", configMINIMAL_STACK_SIZE, nullptr, 1, nullptr);
    xTaskCreate(TaskB, "Bouncing Task2", configMINIMAL_STACK_SIZE, nullptr, 1, nullptr);

    vTaskStartScheduler();

    printf("Error\n");
    while(true)
    {
        ;;
        sleep_ms(1000);
    }
}

The CMake is the following:

cmake_minimum_required(VERSION 3.13)

set(PICO_BOARD pico_w)
set(FREERTOS_KERNEL_PATH ./FreeRTOS-Kernel)
# initialize pico-sdk from submodule
# note: this must happen before project()
include(pico-sdk/pico_sdk_init.cmake)
include(pico_FreeRTOS_Kernel_import.cmake)

project(PICO_OS_TEST)

# initialize the Raspberry Pi Pico SDK
pico_sdk_init()

add_executable(pico_usb_print_deadlock_test main_issue_poc.cpp)

target_include_directories(pico_usb_print_deadlock_test PRIVATE ${CMAKE_CURRENT_LIST_DIR} )
target_include_directories(pico_usb_print_deadlock_test PRIVATE ${FREERTOS_KERNEL_PATH}/${FREERTOS_KERNEL_RP2040_RELATIVE_PATH}/include )
target_link_libraries(pico_usb_print_deadlock_test pico_stdlib pico_multicore FreeRTOS-Kernel FreeRTOS-Kernel-Heap4)

pico_enable_stdio_usb(pico_usb_print_deadlock_test 1)
pico_enable_stdio_uart(pico_usb_print_deadlock_test 0)
pico_add_extra_outputs(pico_usb_print_deadlock_test)

Using OpenOCD and another pico, I've debugged the code, and it seems to be blocked on the same spinlock handle (see the attached image for the stack)

2023-07-07_20-21

Here I was able to check that deadlock was on the same handle: 2023-07-07_20-44 2023-07-07_20-45

Is there a workaround? Thanks for any help

peterharperuk commented 1 year ago

Does it happen with stdio uart?

adri-sass commented 1 year ago

@peterharperuk with the usb stdio the problem occurs within seconds, using the uart stdio after 10 minutes everything was working fine, so the answer to your question is "probably not"

kilograham commented 1 year ago

urgggh. this is nasty, and one root cause is a bug in our RP2040 SMP FreeRTOS port; which doesn't look at all trivial to fix in general.

note the manifestation you are probably hitting (which is also easier to fix), can actually be worked around by disabling RTOS asserts (i.e. set configASSERT(x) to not reference x), or if that is defined to be assert(x), then set NDEBUG=1 or do a Release build.

Really, the pico_stdio_usb code should now use pico_asyc_context and thus use a more sensible (than IRQ) mechanism for background work under FreeRTOS.

adri-sass commented 1 year ago

@kilograham thank you for your analisys, but the workaround you suggest, delays only the deadlock.

I've tried the following steps:

#include <assert.h>
/* Define to trap errors during development. */
//#define configASSERT(x)                         assert(x)
#define configASSERT(x)         

At this point the issue still shows up.

I hope this test helps, let me know if you need more info or if there is a more effective workaround/fix.

Thank you in advance

kilograham commented 1 year ago

yes, this fixed only one possible place the issue occurred, and i left it running and it seemed happier, but yeah mine hung too.

daveythacher commented 1 year ago

Increase stack size please.

daveythacher commented 1 year ago

urgggh. this is nasty, and one root cause is a bug in our RP2040 SMP FreeRTOS port

Can you explain?

adri-sass commented 11 months ago

Hi @daveythacher, sorry for the late reply. I've doubled the stack size, but the issue still shows up

daveythacher commented 9 months ago

@kilograham Is the entire mutex implementation bad?

Edit: What happens to owner?

kilograham commented 9 months ago

it is lower level than that

on the FreeRTOS port, either lock_internal_spin_unlock_with_wait or lock_internal_spin_unlock_with_notify(i forget which or both off the top of my head) are actually not guaranteed to be "atomic" as per their documentation (because IRQs are enabled on one FreeRTOS path, and a task switch can occur if an IRQ is taken)

daveythacher commented 9 months ago

That should not cause an issue if the mutex is atomic. It creates blocking logic but that itself is designed to be fine from my understanding of the stdio and usb implementation. (They are designed to back off (yield) and come back in a polling task.)

However my understanding of the mutex implementation is that they are not atomic. The spin lock is atomic however the owner flag can be duplicated in multiple places. (Declare as volatile?)

Another issue is the USB ISR (USB and timer) calls directly into tinyUSB. However this may be okay on ARM if it implements the reserved context for ISRs which is shared. This context would need to be large.

Why do you mutex guard tinyUSB?

Edit: In hindsight the tinyUSB guard is somewhat self evident.

daveythacher commented 9 months ago

Here is a hack which may work around the problem. This is not an ideal solution due to the overhead.

#include <stdio.h>
#include <stdint.h>
#include "pico/stdlib.h"
#include "FreeRTOS-Kernel/include/FreeRTOS.h"
#include "FreeRTOS-Kernel/include/task.h"
#include "FreeRTOS-Kernel/include/queue.h"

static xQueueHandle queue;
static TaskHandle_t task;

void TaskA(void*)
{
    uint16_t VAL = 0;   
    while(true)
    {
        uint32_t data = (1 << 16) | VAL;
        while(xQueueSend(queue, &data, portMAX_DELAY) != pdPASS);
        ++VAL;
    }
}

void TaskB(void*)
{
    uint16_t VAL = 0;
    while(true)
    {
        uint32_t data = (2 << 16) | VAL;
        while(xQueueSend(queue, &data, portMAX_DELAY) != pdPASS);
        ++VAL;
    }
}

void TaskC(void*)
{
    while(true)
    {
        uint32_t data;
        while(xQueueReceive(queue, &data, portMAX_DELAY) != pdPASS);

        if ((data >> 16) == 1)
            printf("TaskA NUM = %d\n", data & 0xFFFF);
        else
            printf("TaskB NUM = %d\n", data & 0xFFFF);
    }
}

int main() {
    stdio_init_all();
    sleep_ms(1000);
    printf("STARTING\n");
    #ifdef LIB_PICO_MULTICORE
        printf("LIB_PICO_MULTICORE defined\n");
    #else
        printf("LIB_PICO_MULTICORE not defined\n");
    #endif

    queue = xQueueCreate(5, sizeof(uint32_t));

    xTaskCreate(TaskA, "Bouncing Task1", configMINIMAL_STACK_SIZE, nullptr, 1, nullptr);
    xTaskCreate(TaskB, "Bouncing Task2", configMINIMAL_STACK_SIZE, nullptr, 1, nullptr);
    xTaskCreate(TaskC, "Bouncing Task3", configMINIMAL_STACK_SIZE, nullptr, 1, &task);

    vTaskCoreAffinitySet( task, 1 << 0 );

    vTaskStartScheduler();

    printf("Error\n");
    while(true)
    {
        ;;
        sleep_ms(1000);
    }
}
daveythacher commented 9 months ago

Note I was not able to pin TaskC to core 1. I can only pin it to core 0. It does not like core 1.

Again I do not know how owner is protected in multicore!

daveythacher commented 9 months ago

I looked into it some.

Core 1 goes for stdio_usb_mutex and gets it.
Core 0 goes for stdio_usb_mutex but only gets spin lock.
Core 0 goes for portGET_ISR_LOCK or portGET_TASK_LOCK in xEventGroupWaitBits
Core 0 releases spin lock in vPortEnableInterrupts
Core 1 releases stdio_usb_mutex
ISR is allowed to run and goes for stdio_usb_mutex (gets it it on core 0)
ISR releases spin lock
Core 1 goes for stdio_usb_mutex but only gets spin lock
Core 1 goes for portGET_ISR_LOCK or portGET_TASK_LOCK in xEventGroupWaitBits
ISR goes for stdio_usb_mutex but cannot get spin lock

Core 1 is dead locking the ISR on core 0
Core 0 is dead locking core 1

If true, core 1 must never go for stdio_usb_mutex or ditch immediately. Without FreeRTOS it normally ditches instantly which I proved does fix it. Looks like xEventGroupWaitBits allows a nested locked which can be split into a deadlock.

I am not sure I understand how stdio_usb_mutex needs to be submitted to xEventGroupWaitBits. The SDK promote stdio_usb_mutex to cover the mutex internal state and the event group bits. The notion of the two seem similar but different. I am not sure the SDK should manage FreeRTOS's internal state.

Edit: This looks to do the trick.

#define configSUPPORT_PICO_SYNC_INTEROP 0

This like lowers the performance, but it should be stable. So you have two workarounds currently, but they both come with performance trade offs.

I think you could make an argument for saying this fix belongs in the FreeRTOS side not the SDK. If this is the direction they wish to go this issue should be closed out.

daveythacher commented 9 months ago

@kilograham FreeRTOS did not define xPortLockInternalSpinUnlockWithBestEffortWaitOrTimeout. This was defined by the SDK? Which make it optional correct? I mean I already suspect this based on trying to compile with it commented out and disabling configSUPPORT_PICO_SYNC_INTEROP. I just want to make sure.

If this is the case the only point here is to use scheduler's event system. We can patch in just about anything we want here?

Edit: The way I see it there are two problems: 1) The mutex is held while the spin lock is released in the middle of the ISR, which allows this to occur. 2) The spin lock is held while attempting to register the thread state. (Logic does not hand off to different mutex. SDK is commit to try/release notion in my opinion, which makes registering thread state somewhat complex.)

Break either one of these and the dead lock may not happen.

A fundamental concern here is for the mutex design itself. Registering with FreeRTOS is dangerous. You would need to forbid accessing a mutex from ISR in general. You could fix the USB issue with async, however should the spin lock not persist the entire ISR? The ISR cannot release the mutex if it loses the spin lock, which means core 1 can continue to attack core 0. The real reason to move the USB to async would be to allow it to run on both cores. So I guess under this point of view the issue is within the SDK.

Create new mutex functions mutex_try_enter_isr and mutex_exit_isr? Note this comes with priority inversion, which is true regardless for printf. So to some extent printf should be considered logging.

kilograham commented 8 months ago

it is lower level than that

on the FreeRTOS port, either lock_internal_spin_unlock_with_wait or lock_internal_spin_unlock_with_notify(i forget which or both off the top of my head) are actually not guaranteed to be "atomic" as per their documentation (because IRQs are enabled on one FreeRTOS path, and a task switch can occur if an IRQ is taken)

Note this is the bug - they should be (and need to be) atomic.

daveythacher commented 7 months ago

I argue they are atomic. The problem is they deadlock using two blocking (atomic) operations when split. You need to force serialization which should create the desired atomic. This means you need to have an exclusive section which forces this. You are missing the blocking logic required to create the exclusion which serializes it into atomic.

Overall there is a potential fallacy, parallelizing printf. We could create exclusion from the scheduler or the spin lock within the mutex. The RTOS and NVIC schedulers are not aligned, which is the operation attempted. The ISR is impure in multiple manners. Another solution is to make the grab of any SDK mutex steal the RTOS mutex first.

My concern is for the SDK getting into tradeoffs, which makes things impure. FreeRTOS has no real way of fixing this issue but they can hack it at least two ways. There are at least three solutions on the SDK side.

Personally I would be tempted to make the following point. The APIs outlined by the SDK for the mutex's interaction with RTOS are deprecated and should be removed. This would force RTOS code to use their own version of mutex which are provided potentially in the port code. I am not sure the SDK factory should be used and by extension the SDK is not required for event notification rather the port code is. Again are you commited to these APIs? (configSUPPORT_PICO_SYNC_INTEROP)

We are really arguing over resource usage at this point.

kilograham commented 1 month ago

I have figured out the issue; I will submit a PR against FreeRTOS, and a patch for the SDK (there were two bugs)... the good news is that SMP and TinyUSB and configSUPPORT_PICO_SYNC_INTEROP=1 all work together now!

peterharperuk commented 1 week ago

Can you re-test this? More work has been done by @kilograham in this area

adri-sass commented 1 week ago

Thank you all. Now I'm not able to test it now, but I'll do it asap

daveythacher commented 1 week ago

Do you have commits or pull requests related to this?

kilograham commented 1 week ago

This is fixed with the latest SDK, and the latest FreeRTOS here https://github.com/raspberrypi/FreeRTOS-Kernel