ocaml / ocaml

The core OCaml system: compilers, runtime system, base libraries
https://ocaml.org
Other
5.19k stars 1.06k forks source link

Fix spurious major GC slices. #13086

Closed damiendoligez closed 4 weeks ago

damiendoligez commented 1 month ago

Reported to me by @stedolan.

Because of commit e6370d56ebe (memory.c) and PR #11750 we have some spurious major GC slices when a minor GC promotes more than 20% of the minor heap and a large allocation comes along before the next scheduled major slice (the one that happens when the minor heap is half full).

This is because the allocated_words counter will keep the amount of memory promoted by the minor GC until the next major slice takes it into account.

The consequence is that the large allocation will trigger a major slice on the spot, then the next scheduled major slice has very little work to do.

The solution is have a separate count for direct major allocations, and use it (instead of all major allocations) to trigger unscheduled major slices.

The problem is illustrated by the program found here: https://gist.github.com/damiendoligez/4d65d0ade50e6d0b2726e812a0eb7a14

Number of major slices (displayed by OCAMLRUNPARAM=v=0x40 ./a.out 2>&1 | grep '^allocated_words =' | wc):

version large_allocs=true large_allocs=false
trunk 3638 1879
this PR 1869 1879

Note that the amount of major GC work is not affected by this problem, but we still incur some overhead for starting the extra slices, and the latency profile is changed (the major slice pauses are closer to the minor GC pauses) so it's still worth fixing.

gasche commented 1 month ago

This could/should probably be a new flag for caml_shared_try_alloc, "do I come from the minor GC?", which would reduce the risk of getting this accounting wrong in the future if we call that function from more places

I convinced myself that the code says what it does, but how do we know that what it does is reasonable? (In particular, are there regressions on other workloads?)

stedolan commented 4 weeks ago

This could/should probably be a new flag for caml_shared_try_alloc, "do I come from the minor GC?", which would reduce the risk of getting this accounting wrong in the future if we call that function from more places

I disagree. This is a change to GC policy: caml_alloc_shr should trigger extra major GC eventually while allocations from promotions should not. It belongs where the GC policy is defined (major_gc.c and memory.c), not in the allocator (shared_heap.c).

I convinced myself that the code says what it does, but how do we know that what it does is reasonable? (In particular, are there regressions on other workloads?)

IIRC, this change narrows an earlier fix that made too wide a change: the / 5 condition resolved an issue with a program that was doing a lot of direct major allocation, but the change affected all programs, even ones that did very little direct major allocation. This change means that the fix for such programs is more precisely targeted.