tensorflow / runtime

A performant and modular runtime for TensorFlow
Apache License 2.0
757 stars 124 forks source link

How to access thread concurrent scheduling control with cost function in BEF executor? #119

Open MoFHeka opened 8 months ago

MoFHeka commented 8 months ago

Work stealing method in NonBlockingWorkQueue is not a good idea to reduce cache missing in LLC automatically. Depth-first execution computation graphs are generally more cache-friendly. If I want to apply a clever algorithm to schedule concurrency task for making a better performance, such as according to the result of llvm-mca. What's the best implement conforming to the software design philosophy. Make a new WorkQueue which is able to rearrange the task?