ocaml / ocaml

The core OCaml system: compilers, runtime system, base libraries
https://ocaml.org
Other
5.18k stars 1.06k forks source link

Regression with default GC settings between `4.14.2` and `5.1.1` #13123

Open toots opened 1 week ago

toots commented 1 week ago

Hi!

We're in the process of switching liquidsoap to OCaml 5.1.1 and we've noticed some pretty severe regressions with the garbage collector.

I'm still testing and need to confirm wether or not we can get to comparable memory/CPU usage than with 4.14 but I'd like to report a first, very simple case.

This liquidsoap script:

output.dummy(blank())

Is basically a runtime loop creating a float array array (ocaml native) of 0.04s of blank PCM audio every 0.04 and discarding it. No C code, only OCaml.

With 4.14, the memory footprint looks like this:

Screenshot 2024-04-25 at 8 16 31 AM

(A big chunk of the memory usage here is due to the language's library)

With OCaml 5.1.1 default GC params, however:

Screenshot 2024-04-25 at 8 37 08 AM

The memory peaks to a pretty high value then stays oscillating around 4x more.

Setting space_overhead=40 actually achieves a better memory footprint than with 4.14.2:

Screenshot 2024-04-25 at 8 52 47 AM

CPU usage is very low on my machine for each case so it's hard to see a pattern there.

I am still testing with more sophisticated scripts and it's not clear yet if setting space_overhead=40 makes it achieve comparable runtime CPU/memory perfs but it looks possible.

tmcgilchrist commented 1 week ago

Perhaps these changes are related https://github.com/ocaml/ocaml/pull/12754, https://github.com/ocaml/ocaml/pull/13086 and https://github.com/ocaml/ocaml/pull/12493.

Olly can give you GC statistics on macOS (which you seem to be using based on those Instruments screenshots) https://github.com/tarides/runtime_events_tools.

If you can reduce this to a small reproduction case, I can add it to sandmark.

toots commented 1 week ago

The problem is not linked to bigarray as it only uses ocaml values.

I was able to write a reproduction test! It looks like the issue happens when the program has some memory pernamently allocated, the standard library in the case of liquidsoap.

There also seems to be a threshold effect: little memory allocated is okay, a lot seems okay too. However, in the middle, around 40Mo, is when the issues seems to be triggered.

Reproduction code:

let frame_size = 0.04
let pcm_len = int_of_float (44100. *. frame_size)
let channels = 2

let deadweigth = Array.make (4000 * 1024) 1.

let mk_pcm () = Array.init channels (fun _ -> Array.make pcm_len 0.)

let rec fn () =
  let pcm = mk_pcm () in
  ignore(pcm);
  Unix.sleepf 0.04;
  fn ()

let () =
  let th = Thread.create fn () in
  Thread.join th

Memory:

Screenshot 2024-04-25 at 9 09 46 PM

BTW, I'm using macos memory profiler because it's really good at giving me only the program's private allocations. Pretty sure the problem happens on other OS/platforms.

toots commented 1 week ago

Setting space_overhead to 40 also seems to help with the example:

Screenshot 2024-04-25 at 9 17 56 PM
toots commented 1 week ago

Ok, I think I've refined the example to be even closer to us:

Code:

let frame_size = 0.04
let pcm_len = int_of_float (44100. *. frame_size)
let channels = 2

let mk_pcm () = Array.init channels (fun _ -> Array.make pcm_len 0.)

let rec fn a =
  if Array.length a <> 0 then
    Gc.full_major ();
  let pcm = mk_pcm () in
  ignore(pcm);
  Unix.sleepf 0.04;
  fn [||]

let () =
  let deadweigth = Array.make (40 * 1024 * 1024) 1 in
  Unix.sleepf 0.04;
  let th = Thread.create fn deadweigth in
  Thread.join th

Memory consumption:

Screenshot 2024-04-25 at 9 41 52 PM

Woof!

Looks like I can see. the following:

Last, a side question: would it be possible to be more directive with the GC? In my application, I know exactly when I should ask the GC to check for memory to cleanup, which is after each media loop. Could it be possible to set the GC params to be very lazy and trigger a check every time a loop terminates?

gasche commented 1 week ago

Note: if your application starts by allocating a lot and throwing most of it away, and you know in the code where that initialization phase ends, you can call an explicit compaction to ensure that that memory is given back to the OS, and that the rest of the program starts from a smaller memory footprint. Compaction was re-enabled in 5.x only recently by @sadiqj (it is in the release branch for 5.2), and it may benefit your workload.

toots commented 1 week ago

Note: if your application starts by allocating a lot and throwing most of it away, and you know in the code where that initialization phase ends, you can call an explicit compaction to ensure that that memory is given back to the OS, and that the rest of the program starts from a smaller memory footprint. Compaction was re-enabled in 5.x only recently by @sadiqj (it is in the release branch for 5.2), and it may benefit your workload.

Thanks! Gc.compact does not seem to help with the last example.

I should have mentioned that these were all confirmed with the latest ocaml git code as well.

toots commented 1 week ago

For reference, this is the memory profile with 4.14.2 on the last example:

Screenshot 2024-04-26 at 8 05 04 AM