The collector is significantly slower than BDWGC. There must be room for enhancement, for example:
unmark small and large objects in parallel;
scan/mark objects from multiple threads;
sweep small and large HEAPs in parallel;
About parallel marking, a naive attempt would be to make Stack thread-safe and execute GC_Collector_mark() from multiple threads, but Stack will become a contention point (too many concurrent push and pop). It would be better to have a global stack, where we push the fiber stack roots, and thread-local ones, where each thread would push/pop objects to scan; with an eventual steal of objects from other threads when their is nothing left to do (e.g. steal half the objects from another thread). If that's reminiscent of how a job stealing scheduler works... this is on purpose!
The collector is significantly slower than BDWGC. There must be room for enhancement, for example:
About parallel marking, a naive attempt would be to make
Stack
thread-safe and executeGC_Collector_mark()
from multiple threads, butStack
will become a contention point (too many concurrent push and pop). It would be better to have a global stack, where we push the fiber stack roots, and thread-local ones, where each thread would push/pop objects to scan; with an eventual steal of objects from other threads when their is nothing left to do (e.g. steal half the objects from another thread). If that's reminiscent of how a job stealing scheduler works... this is on purpose!