mirage / ocaml-solo5

Freestanding OCaml runtime
Other
100 stars 30 forks source link

On hvt, I sometimes get a negative reading from malloc_memory_usage #126

Closed hannesm closed 11 months ago

hannesm commented 12 months ago

I use ocaml-solo5 at its latest release (0.4.1), and one of my unikernels has a negative malloc_memory_usage (called from mirage-solo5,

CAMLprim value
mirage_memory_get_fast_live_words(value v_unit)
{
    return Val_long(malloc_memory_usage() / sizeof(value));
}

Please see the evolution of the "memory live bytes" in the diagram below. I'm not sure what the peak on Oct 16th is about. Any chance there could be an overflow somewhere? In dlmalloc.i, I noticed dispose_chunk does unconditionally subtract psize...

static void dispose_chunk(mstate m, mchunkptr p, size_t psize) {
  gm->internal_memory_usage -= psize;

But I'm not too familiar with the dlmalloc code.

Screenshot 2023-11-13 at 10-26-27 Grafana - unikernel stats

palainp commented 12 months ago

Thanks, as internal_memory_usage is defined as size_t it should not be an overflow. I'll check on a small unikernel and compare against the GC values. To be sure to understand the figure, memory live bytes should be really close to gc cur live right ?

hannesm commented 12 months ago

To understand the picture in more detail:

palainp commented 12 months ago

Ok it should be close to gc heap (as some bytes might not have been released by gc be are not alive). I'll try to write a unikernel doing a lot of allocations/releases to track the issue down.

hannesm commented 12 months ago

What I expect is that:

From experience (due to Cstruct.t/Bigarray), the memory live bytes can easily be twice the gc heap (since Bigarray are not accounted for in the gc heap).

hannesm commented 12 months ago

The kinfo_mem_resident is the same as you see in top as RES (and just now, the malloc live grew above kinfo_mem_resident value until a GC compaction was done -- that puzzles me again)

Screenshot 2023-11-13 at 13-03-57 Grafana - unikernel stats

hannesm commented 12 months ago

Should I maybe graph as well the live_words of Memory.stat () (using mallinfo's uordblks)? There are several other unikernels where the resident number is below the malloc live one (and I re-checked the resident to be in line with what the OS reports in top).

I'm using OCaml's best-fit GC discipline, and also find it strange there are compactions happening...

palainp commented 12 months ago

Yes Memory.stat is the more precise value we can get from dlmalloc and should be the same as stat_fast (slower as dlmalloc has to go through the entire heap but I think it can be helpful here) or I missed something.

hannesm commented 12 months ago

Here's another diagram (of a secondary nameserver), which I do not quite understand... The breaks at 9:05, 11:05 are restarts. What is strange is that (a) negative values for "memory live bytes" and (b) "gc heap" being above kinfo_mem_resident.

The unikernel in question is a secondary NS, which has some configuration to act as a secondary of an IP address which returns connection refused.

Screenshot 2023-11-14 at 12-37-42 Grafana - unikernel stats

hannesm commented 12 months ago

And here's a graph with the mallinfo (uordblks) as well. Pretty close following of gc heap and mallinfo live...

Is there anything that'd help for the malloc_memory_usage debugging?

Screenshot 2023-11-14 at 21-00-54 Grafana - unikernel stats

palainp commented 12 months ago

Hmm that's odd you have a divergence between mallinfo and memory_usage pretty fast whereas I don't see it even after two hours run, I surely missed something when trying to reproduce your setup (the memory usage just raise until OOM :( )

In the meantime I wrote back another allocator in the branches buddy-malloc of my ocaml-solo5 and mirage-solo5 forks (much simpler). I just tested on hello skeleton unikernel. It has a 32kB overhead, and for the time being is limited to 16MB of available memory.

palainp commented 11 months ago

Update: I have to admit that my ratio (bad code lines)/(good code lines) is finite but greater than 1 with the dlmalloc patch :(

I found that sometimes dlmalloc is asked to realloc memory to a greater size, it founds that the immediate memory is enough for the request and is currently unused, so it reallocates in place (which avoid a costly memcpy). In that precise case neither malloc+free are called and the memory_usage counter isn't updated (which causes some of the divergence you can observe and leads to memory_usage decrease too much EDIT: as when this memory will be freed it now has the new size).

Another mistake was putting memory_usage update in dispose_chunk as dlmalloc, when it needs to split an adjacent chunk to increase the current, uses dispose_chunk on the second half of the next chunk (e.g. https://github.com/mirage/ocaml-solo5/blob/714de76319d1c3137302a2fd4373a5a1284a4a91/nolibc/dlmalloc.i#L4860) but the memory for that chunk wasn't counted in memory_usage...

The story so far, it would be great if you could try out https://github.com/palainp/ocaml-solo5/tree/dlmalloc . I currently don't see any divergence (but I have a different setup than yours I use https://github.com/palainp/solo5memory a unikernel that accept requests, and netcat to create/close flows).

EDIT: I'll need to remove the DEBUG_PRINTF lines before PR if this patch does fix the issue.

hannesm commented 11 months ago

Dear @palainp, thanks a lot. Your patch works great for me. Please see below the new graph of the same unikernel (for the 3 hours of its lifetime). The mallinfo uordblks and malloc_memory_usage agree on the same values all the time :) (hard to see since the lines are over each other).

I'd appreciate your PR to ocaml-solo5 and would merge and cut a release :)

Screenshot 2023-11-15 at 20-22-18 Grafana - unikernel stats

palainp commented 11 months ago

That's great! Thank you @hannesm!