[Idea] Implement busy-loop threads for osx

Civil commented 1 year ago

As OSX is known to completely ignore all affinity hints (on normal devices) and then pin tasks to random cores, those making results utterly useless, what if implement a mode which will start specified amount of threads, do some busy-loop with useless calculations (ideally float point or mixed) except for 2 threads that will try to perform actual test. There is a chance that it will force OSX Scheduler to pin threads to cores automatically (though might not work for efficiency cluster, especially with simple busy loops, as I've seen cases when starting a task on an efficiency core made scheduler going wild and migrating all threads across all cores, however that was integer calculation workload). If that goes well, it might be possible to get real cluster-to-cluster latency and core-to-core within cluster under OSX without need to run Linux for that.

If that is something that you think might work and worth implementing I can try to do a PR (however my rust skills are not great so to speak, so no promises on a timeline and code quality of initial PR might be not to the level of this tool)

nviennot commented 1 year ago

Even with this strategy, we have no control on which cores the two threads are running right?

Civil commented 1 year ago

As far as I understand - no, we still don't. It just should increase probability of persistently hitting different cores (in random order). Currently it's extremely likely to hit same 2 cores all the time.

Civil commented 1 year ago

Actually according to the https://eclecticlight.co/2022/01/25/scheduling-of-threads-on-m1-series-chips-second-draft/

It is somewhat known how current versions of OSX do the scheduling, again - no guarantees that it won't change in future.

nviennot / core-to-core-latency

[Idea] Implement busy-loop threads for osx #78