zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.38k stars 6.36k forks source link

ESP-32 preemption regressions with asm2 #6346

Closed andyross closed 6 years ago

andyross commented 6 years ago

In some cases (though apparently not all) the new asm2 code on esp32 is failing in situations where it is asked to switch back to a previously-interrupted thread from an otherwise working one. See https://github.com/zephyrproject-rtos/zephyr/issues/6339 for one report.

The most common case happens with the idle thread, which will be correctly interrupted and switched away from (e.g. a timer interrupt will wake up a blocked thread), but then the system will hang (or sometimes report a spurious exception) when switching back into idle.

There's a heisenbug quality to this too, where calling printk() inside the interrupt handler will magically "fix" the problem. My current theory is that the window spill code is broken somehow: calling into a complicated handler like printk() will flush the register windows for us and make the spill code a noop. The spill code is different between ESP-32 (which has 64 registers) and qemu (32), which might explain the difference.

This also is consistent with the "failure on switch back to idle" behavior -- a bad spill would only affect the old thread and not the new one.

locomuco commented 6 years ago

@andyross what ESP32 toolchain version is currently supported by zephyr? the documentation always points to the latest esp32 toolchain

andyross commented 6 years ago

Got it. It wasn't the spill code, which had a decent unit test and was pretty simple. It wasn't the cross-stack call code, which was likewise testable in isolation but a little more subtle. It was the really obvious-looking glue between the two. Cross-stack calls return with the stack pointer still pointing to the restore area, which lives below the interrupted stack. But the register spills need the stack pointer to point to the actual function that got interrupted, otherwise its CALLER (xtensa register windows are kinda crazy) will spill to the wrong spot.

Turns out qemu never managed to hit this case because it has half as many windowed registers as ESP-32.

andyross commented 6 years ago

@locomuco Sorry, missed your question. We support only the official SDK toolchain from Esspressif on ESP-32. @lpereira can give more context, but broadly Xtensa toolchains are not portable between targets, they are hard-configured to assume specific hardware features. Part of the output of their CPU generator is an "overlay" source package you have to apply to a specific version of binutils/gcc, making a generic toolchain really hard to manage.