Open peter-mitsis opened 2 days ago
For what it is worth, below are a few numbers from running the preemptive thread_metric benchmark on the disco_l475_iot1 board using CONFIG_SCHED_MULTIQ=y (higher numbers is better)
main branch: 5731301 main branch + this commit: 5553629 main branch + An empty clear_halting(): 5339260 (this patch can be found in #81311) main branch + this commit + an empty clear_halting(): 5877246
It definitely seems weird that the performance for the individual patches drops, but it turns out that together, they work very well.
Hold up: so performance drops by 3% with this patch (weird), drops even more if you then make clear_halting() optimize away (even weirder), but magically is 2.5% faster when you combine both tricks?
So, first, are you sure the numbers are reliable and not just noisy? Maybe try a few times or whatever to check a variance?
And if they are good measurements, we're probably looking at a compiler artifact instead, like the smaller code pushes us above/below the threshold for an inlining operation. And we should check the generated code to figure out what the deltas are. If it's just about "this function is fast when inlined" we can treat that more directly.
I did some analysis on the generated assembly code. Emptying either clear_halting() or reorganizing the suspend/dead checks in halt_thread() is enough to trigger the compiler to hit some kind of threshold as to how it organizes the branches. One obvious difference between the assembly outputs is that in the main branch, halt_thread() is getting inlined while z_thread_halt() is not. However, in the modified versions, it is z_thread_halt() that is getting inlined and halt_thread() is not. However, when both are modifications are present it is enough to overcome whatever its limitations are and result in better organization and performance.
Minor reorganization to halt_thread() to streamline the branching and comparisons.
Originally part of #81311, this commit has been split out so it could receive a ton of testing in isolation.