mmtk / mmtk-core

Memory Management ToolKit
https://www.mmtk.io
Other
379 stars 69 forks source link

Native MarkSweep: bad load balancing with single-threaded workloads #1146

Closed wks closed 4 months ago

wks commented 5 months ago

Currently, we parallelize the sweeping work by making one work packet for the global pool and one packet for each mutator. It is OK for multi-threaded work loads, but when there is only one mutator, it hits a pathological case where the Release stage is dominated by a single long-running ReleaseMutator work packet. Here is a timeline captured using eBPF when executing the Liquid benchmark using the Ruby binding (a single mutator, but multiple GC workers)

image

In comparison, here is the timeline for the lusearch benchmark in the DaCapo Chopin benchmark suite (with eager-sweeping force-enabled). The parallel sweeping of mutators is better, but the Release work packet is not parallelized with ReleaseMutator

image

We should parallelize it by making work packets, each releasing a reasonable amount blocks.

qinsoon commented 5 months ago

We may want to have a way to split BlockLists or each BlockList into several work packets. In that case, we can parallelize the global block lists and the thread local block lists for each mutator. The issues shown in the two graphs can be mitigated.

wks commented 5 months ago

ReleaseMutator is not executed until Release finishes. That's because the Release work packet spawns ReleaseMutator work packets after Plan::release returns. It's a designed feature, but it means we shouldn't sweep the global pool in Release.

Since this problem affects all plans, I created a dedicated issue: https://github.com/mmtk/mmtk-core/issues/1147