zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.8k stars 6.59k forks source link

Unlocking nested k_sched_lock() cause forced preemption #17869

Closed yashi closed 5 years ago

yashi commented 5 years ago

Describe the bug k_sched_unlock() calls update_cache() with preempt_ok == 1. But when k_sched_unlock() is to unlock nested lock, it will preempt the current thread even though the caller still holds the scheduler lock.

k_sched_lock();
k_sched_lock();
k_sched_unlock();  /* <--- this shouldn't be a scheduling point */
k_sched_unlock();  /* <--- this is a scheduling point */

To Reproduce Steps to reproduce the behavior:

  1. Get main.c from https://gist.github.com/yashi/517aab09ce16cda537341f5c5d958fac
  2. mkdir build; cd build
  3. cmake -DBOARD=qemu_cortex_m3
  4. make
  5. See error

Expected behavior Nested lock should be taken care.

Impact A preemptive thread with the scheduler lock is unexpectedly preempted.

Screenshots or console output

$ ninja run
[3/8] Linking C executable zephyr/zephyr_prebuilt.elf
Memory region         Used Size  Region Size  %age Used
           FLASH:       11580 B       256 KB      4.42%
            SRAM:       14364 B        64 KB     21.92%
        IDT_LIST:         120 B         2 KB      5.86%
[7/8] To exit from QEMU enter: 'CTRL+a, x'[QEMU] CPU: cortex-m3
qemu-system-arm: warning: nic stellaris_enet.0 has no peer
***** Booting Zephyr OS build zephyr-v1.14.0-2821-g92dd53a55cee *****
*    main: 0x20000180 w/ -2
* preempt: 0x2000006c w/ 1
*    coop: 0x20000008 w/ -1
##### hello from main
##### hello from coop
##### hello from preempt
##### about to unlock
##### This shouldn't be printed
QEMU: Terminated

Environment (please complete the following information): Doesn't matter.

Source code:

#include <zephyr.h>

#define MY_STACK_SIZE 5000

K_THREAD_STACK_DEFINE(preempt_stack, MY_STACK_SIZE);
struct k_thread preempt_thread;

K_THREAD_STACK_DEFINE(coop_stack, MY_STACK_SIZE);
struct k_thread coop_thread;

static void preempt(void *arg1, void *arg2, void *arg3)
{
    ARG_UNUSED(arg1);
    ARG_UNUSED(arg2);
    ARG_UNUSED(arg3);

    k_sched_lock();
    k_sched_lock();

    printk("##### hello from preempt\n");

    k_busy_wait(20000);

    printk("##### about to unlock\n");
    k_sched_unlock();

    while (true) {
    }

    k_sched_unlock();
}

static void coop(void *arg1, void *arg2, void *arg3)
{
    ARG_UNUSED(arg1);
    ARG_UNUSED(arg2);
    ARG_UNUSED(arg3);

    printk("##### hello from coop\n");
    k_usleep(1);
    printk("##### This shouldn't be printed\n");

    while (true) {
    }
}

void main(void)
{
    k_thread_priority_set(k_current_get(), -2);

    k_thread_create(&preempt_thread, preempt_stack,
            K_THREAD_STACK_SIZEOF(preempt_stack),
            preempt,
            NULL, NULL, NULL,
            1, 0, K_NO_WAIT);

    k_thread_create(&coop_thread, coop_stack,
            K_THREAD_STACK_SIZEOF(coop_stack),
            coop,
            NULL, NULL, NULL,
            -1, 0, K_NO_WAIT);

    printk("*    main: %p w/ %d\n", k_current_get(), k_thread_priority_get(k_current_get()));
    printk("* preempt: %p w/ %d\n", &preempt_thread, k_thread_priority_get(&preempt_thread));
    printk("*    coop: %p w/ %d\n", &coop_thread, k_thread_priority_get(&coop_thread));

    while (true) {
        printk("##### hello from main\n");
        k_sleep(100);
    }

    return;
}
andrewboie commented 5 years ago

Good catch, once addressed we should transplant your example code to a test case to ensure this stays fixed.

aescolar commented 5 years ago

CC @andyross

andyross commented 5 years ago

Definitely a bug. I don't think it was ever expected that a sched lock was a preemption point at all, that argument should be false. Alternatively, we should deprecate the API entirely -- it's not SMP-safe (that is, it prevents the current CPU from preempting your thread and can maybe be productively used for some purposes, but it's not a "lock" as it says nothing about what the other CPU is going to run).

andrewboie commented 5 years ago

Alternatively, we should deprecate the API entirely

Yeah I think people were using this mostly as an alternative to irq locking, I'd be OK with throwing a __deprecated tag on these, they should really use a semaphore or something in most cases