Work stealing method in NonBlockingWorkQueue is not a good idea to reduce cache missing in LLC automatically. Depth-first execution computation graphs are generally more cache-friendly.
If I want to apply a clever algorithm to schedule concurrency task for making a better performance, such as according to the result of llvm-mca.
What's the best implement conforming to the software design philosophy. Make a new WorkQueue which is able to rearrange the task?
Work stealing method in NonBlockingWorkQueue is not a good idea to reduce cache missing in LLC automatically. Depth-first execution computation graphs are generally more cache-friendly. If I want to apply a clever algorithm to schedule concurrency task for making a better performance, such as according to the result of llvm-mca. What's the best implement conforming to the software design philosophy. Make a new WorkQueue which is able to rearrange the task?