Closed wks closed 4 months ago
We may want to have a way to split BlockLists
or each BlockList
into several work packets. In that case, we can parallelize the global block lists and the thread local block lists for each mutator. The issues shown in the two graphs can be mitigated.
ReleaseMutator
is not executed until Release
finishes. That's because the Release
work packet spawns ReleaseMutator
work packets after Plan::release
returns. It's a designed feature, but it means we shouldn't sweep the global pool in Release
.
Since this problem affects all plans, I created a dedicated issue: https://github.com/mmtk/mmtk-core/issues/1147
Currently, we parallelize the sweeping work by making one work packet for the global pool and one packet for each mutator. It is OK for multi-threaded work loads, but when there is only one mutator, it hits a pathological case where the Release stage is dominated by a single long-running
ReleaseMutator
work packet. Here is a timeline captured using eBPF when executing the Liquid benchmark using the Ruby binding (a single mutator, but multiple GC workers)In comparison, here is the timeline for the lusearch benchmark in the DaCapo Chopin benchmark suite (with eager-sweeping force-enabled). The parallel sweeping of mutators is better, but the
Release
work packet is not parallelized withReleaseMutator
We should parallelize it by making work packets, each releasing a reasonable amount blocks.