zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
9.74k stars 6k forks source link

irq_offload on x86_64 isn't SMP-safe #72172

Open andyross opened 2 weeks ago

andyross commented 2 weeks ago

The implementation of irq_offload() on x86_64 looks like this (see https://github.com/zephyrproject-rtos/zephyr/blob/main/arch/x86/core/intel64/irq_offload.c#L21)

void arch_irq_offload(irq_offload_routine_t routine, const void *parameter)
{
    x86_irq_funcs[CONFIG_IRQ_OFFLOAD_VECTOR - IV_IRQS] = routine;
    x86_irq_args[CONFIG_IRQ_OFFLOAD_VECTOR - IV_IRQS] = parameter;
    __asm__ volatile("int %0" : : "i" (CONFIG_IRQ_OFFLOAD_VECTOR)
              : "memory");
    x86_irq_funcs[CONFIG_IRQ_OFFLOAD_VECTOR - IV_IRQS] = NULL;
}

Note that storage for routine and parameter is static, not per-CPU. That will race if you have two CPUs doing simultaneous irq_offload() calls.

This is exactly the architecture of the smp_abort test added in #72119 , and indeed it fails spuriously for me at about 3-5% frequency. Adding logging, you can see that what happens is that the two ISRs get the same argument struct.

(A nice extra is that Peter's test turns out to be a good regression test for this sort of thing in the future)

andyross commented 2 weeks ago

Oh, and remember to un-exclude the tests/kernel/smp_abort test on the arch once the bug is fixed.