Native MarkSweep: bad load balancing with single-threaded workloads

wks commented 5 months ago

Currently, we parallelize the sweeping work by making one work packet for the global pool and one packet for each mutator. It is OK for multi-threaded work loads, but when there is only one mutator, it hits a pathological case where the Release stage is dominated by a single long-running ReleaseMutator work packet. Here is a timeline captured using eBPF when executing the Liquid benchmark using the Ruby binding (a single mutator, but multiple GC workers)

In comparison, here is the timeline for the lusearch benchmark in the DaCapo Chopin benchmark suite (with eager-sweeping force-enabled). The parallel sweeping of mutators is better, but the Release work packet is not parallelized with ReleaseMutator

We should parallelize it by making work packets, each releasing a reasonable amount blocks.

qinsoon commented 5 months ago

We may want to have a way to split BlockLists or each BlockList into several work packets. In that case, we can parallelize the global block lists and the thread local block lists for each mutator. The issues shown in the two graphs can be mitigated.

wks commented 5 months ago

ReleaseMutator is not executed until Release finishes. That's because the Release work packet spawns ReleaseMutator work packets after Plan::release returns. It's a designed feature, but it means we shouldn't sweep the global pool in Release.

Since this problem affects all plans, I created a dedicated issue: https://github.com/mmtk/mmtk-core/issues/1147

mmtk / mmtk-core

Native MarkSweep: bad load balancing with single-threaded workloads #1146