kernel: Reorganize halt_thread()

zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.

https://docs.zephyrproject.org

Apache License 2.0

10.92k stars 6.65k forks source link

kernel: Reorganize halt_thread() #81677

Open peter-mitsis opened 2 days ago

peter-mitsis commented 2 days ago

Minor reorganization to halt_thread() to streamline the branching and comparisons.

Originally part of #81311, this commit has been split out so it could receive a ton of testing in isolation.

peter-mitsis commented 1 day ago

For what it is worth, below are a few numbers from running the preemptive thread_metric benchmark on the disco_l475_iot1 board using CONFIG_SCHED_MULTIQ=y (higher numbers is better)

main branch: 5731301 main branch + this commit: 5553629 main branch + An empty clear_halting(): 5339260 (this patch can be found in #81311) main branch + this commit + an empty clear_halting(): 5877246

It definitely seems weird that the performance for the individual patches drops, but it turns out that together, they work very well.

andyross commented 1 day ago

Hold up: so performance drops by 3% with this patch (weird), drops even more if you then make clear_halting() optimize away (even weirder), but magically is 2.5% faster when you combine both tricks?

So, first, are you sure the numbers are reliable and not just noisy? Maybe try a few times or whatever to check a variance?

And if they are good measurements, we're probably looking at a compiler artifact instead, like the smaller code pushes us above/below the threshold for an inlining operation. And we should check the generated code to figure out what the deltas are. If it's just about "this function is fast when inlined" we can treat that more directly.

peter-mitsis commented 26 minutes ago

I did some analysis on the generated assembly code. Emptying either clear_halting() or reorganizing the suspend/dead checks in halt_thread() is enough to trigger the compiler to hit some kind of threshold as to how it organizes the branches. One obvious difference between the assembly outputs is that in the main branch, halt_thread() is getting inlined while z_thread_halt() is not. However, in the modified versions, it is z_thread_halt() that is getting inlined and halt_thread() is not. However, when both are modifications are present it is enough to overcome whatever its limitations are and result in better organization and performance.