mirage / qubes-mirage-firewall

A Mirage firewall VM for QubesOS
BSD 2-Clause "Simplified" License
210 stars 28 forks source link

With solo5, firewall runs out of memory after a while and can't recover #120

Closed talex5 closed 2 years ago

talex5 commented 4 years ago

First reported by @palainp in https://github.com/mirage/qubes-mirage-firewall/pull/116#issuecomment-708936007. I see this too now, after a few days.

2020-10-26 15:26:48 -00:00: INF [memory_pressure] Writing meminfo: free 115MiB / 124MiB (93.24 %)
[...]
2020-10-27 09:00:57 -00:00: INF [client_eth:dom33:10.137.0.15] who-has 10.137.0.16? responding with fe:ff:ff:ff:ff:ff
2020-10-27 09:01:04 -00:00: INF [memory_pressure] Writing meminfo: free 13MiB / 124MiB (9.99 %)
2020-10-27 09:01:04 -00:00: WRN [firewall] Memory low - dropping packet and resetting NAT table
2020-10-27 09:01:04 -00:00: INF [memory_pressure] Writing meminfo: free 13MiB / 124MiB (9.99 %)
2020-10-27 09:01:04 -00:00: WRN [firewall] Memory low - dropping packet and resetting NAT table
2020-10-27 09:01:04 -00:00: INF [memory_pressure] Writing meminfo: free 13MiB / 124MiB (9.99 %)
2020-10-27 09:01:04 -00:00: WRN [firewall] Memory low - dropping packet and resetting NAT table
hannesm commented 4 years ago

I managed to get memtrace (and memtrace_viewer) to work with MirageOS unikernels -- take a look at https://github.com/hannesm/memtrace-mirage this may be useful for debugging it further (being able to see the OCaml view of which call stacks refer to how much memory) -- see https://blog.janestreet.com/finding-memory-leaks-with-memtrace/ for a great description of memtrace_viewer.

palainp commented 4 years ago

My current thinking is that malloc_footprint() is too pessimistic and need some time to be updated when we run out of free memory.

The second solution offered by @mato in https://github.com/mirage/qubes-mirage-firewall/pull/116#issuecomment-704827905 was to use sbrk(0) instead of malloc_footprint() but this function is currently not available in nolibc (https://github.com/mirage/ocaml-freestanding/blob/master/nolibc/include/unistd.h).

The easiest workaround may be to add the state function (and use the slower but more accurate mallinfo) in addition to the actual quick_state and if we get low on free memory reported by quick_state we can more accurately assess the free memory before calling the GC. We will see a drop in performance (or a growth on CPU usage) for a while but when quick_state will be updated the unikernel will work as before.

talex5 commented 4 years ago

It seems to fail very suddenly. Here's an example (I got it to print GC stats and heap stats once a minute; the unikernel had been running fine for several days before this):

GC stats
minor_collections: 7775
major_collections: 149
compactions:       0

minor_words:    2018361516
promoted_words:   32060857
major_words:      32078117

top_heap_words: 1331712
heap_words:     1331712
live_words:     1008819
free_words:      316710
largest_free:       384
fragments:         6183

live_blocks: 271475
free_blocks: 49212
heap_chunks: 14
2020-11-01 03:31:48 -00:00: INF [memory_pressure] :Writing meminfo: free 115MiB / 124MiB (93.24 %)
GC stats
minor_collections: 7822
major_collections: 150
compactions:       0

minor_words:    2030324081
promoted_words:   32236779
major_words:      32254039

top_heap_words: 1331712
heap_words:     1331712
live_words:     1184741
free_words:      140033
largest_free:       800
fragments:         6938

live_blocks: 309418
free_blocks: 15954
heap_chunks: 14
2020-11-01 03:32:48 -00:00: INF [memory_pressure] :Writing meminfo: free 115MiB / 124MiB (93.24 %)
GC stats
minor_collections: 7942
major_collections: 153
compactions:       0

minor_words:    2061383180
promoted_words:   32681555
major_words:      32698815

top_heap_words: 1531904
heap_words:     1531904
live_words:      693951
free_words:      835167
largest_free:    194872
fragments:         2786

live_blocks: 204263
free_blocks: 23925
heap_chunks: 15
2020-11-01 03:32:57 -00:00: INF [memory_pressure] :Writing meminfo: free 12MiB / 124MiB (9.10 %)
2020-11-01 03:32:57 -00:00: WRN [firewall] Memory low - dropping packet and resetting NAT table
GC stats
minor_collections: 7943
major_collections: 155
compactions:       0

minor_words:    2061387164
promoted_words:   32681970
major_words:      32699230

top_heap_words: 1531904
heap_words:     1531904
live_words:      468643
free_words:     1062198
largest_free:    194812
fragments:         1063

live_blocks: 144429
free_blocks: 14564
heap_chunks: 15
2020-11-01 03:32:57 -00:00: INF [memory_pressure] :Writing meminfo: free 12MiB / 124MiB (9.10 %)

So, it had 115 MB free. Then it tried to increase the OCaml heap by 1.5 MB. Then there was only 12 MB free??

When this happened, I was uploading a 30K file to gmail.

When it happens, it always seems to have just under 10% free.

mato commented 4 years ago

If you give it, say, 512MB of memory, and sample meminfo every 30s or so (both the raw numbers from GC and OS.MM), what does that look like over time? And, is there an amount of memory with which you reach a plateau and don't run out?

talex5 commented 4 years ago

@mato: that's roughtly what I'm doing already (except sampling every 60s rather than every 30s). I'd normally run the firewall with 32MB, but here I'm testing with 128MB, which should be way more than it needs.

Memory use looks completely flat until it fails. I only showed the last two good reports before failure, but it continued like that for days.

Note that the reports above where it went from 115 MB free to 12 MB free (and head_chunks went up by 1) are only 9 seconds apart!

mato commented 4 years ago

Memory use looks completely flat until it fails. I only showed the last two good reports before failure, but it continued like that for days.

Ah, so, it suffers from "sudden death" after working fine for days. That's very strange. No ideas off the top of my head, I'll think about it.

palainp commented 4 years ago

Between 03:32:48 and 03:32:57 the number of fragments is divided by 2 and the size of the largest free block increased a lot, it may reveals a memory consolidation from the allocator.

I'm not sure I understand correctly how and when the footprint value is updated but looks like we might have to expect success in this test : https://github.com/mirage/ocaml-freestanding/blob/master/nolibc/dlmalloc.i#L4775

This bug is difficult to trigger because we may have to wait a long time to get it. I'm currently trying to evaluate the free memory available with mallinfo() if we are low on the value reported by malloc_footprint.

palainp commented 4 years ago

Update: I use a mirage-xen pinned at https://github.com/palainp/mirage-xen.git that add stat() using mallinfo() and this modified status function in memory_pressure.ml:

let status () =
  let stats = OS.Memory.quick_stat () in
  if fraction_free stats > 0.1 then `Ok
  else (
    let stats = OS.Memory.stat () in
    if fraction_free stats > 0.1 then (
      Log.info (fun f -> f "quick_stat reported low memory but stat doesn't");
      `Ok
    ) else (
      Gc.full_major ();
      let stats = OS.Memory.stat () in
      report_mem_usage stats;
      if fraction_free stats < 0.1 then `Memory_critical
      else `Ok
    )
  )

This morning I got the following in the unikernel logs:

2020-11-06 09:02:16 -00:00: INF [client_eth:dom15:10.137.0.17] who-has 10.137.0.21? responding with fe:ff:ff:ff:ff:ff
2020-11-06 09:02:24 -00:00: INF [memory_pressure] quick_stat reported low memory but stat doesn't
...
2020-11-06 09:02:57 -00:00: INF [memory_pressure] quick_stat reported low memory but stat doesn't
2020-11-06 09:03:22 -00:00: INF [client_eth:dom10:10.137.0.5] who-has 10.137.0.21? responding with fe:ff:ff:ff:ff:ff

So now I still don't know what the bug is due to, but when it appears the kernel runs 100% for 30" then goes back to a normal situation.

palainp commented 4 years ago

After the previous quick_stat reported low memory but stat doesn't the unikernel worked fine for 2h, but then this message came back and stayed for another 2h...

Now I'm trying to use malloc_trim() when there is a difference between the free memory returned by footprint() and mallinfo() to try to update the value returned by footprint() sooner : https://github.com/palainp/ocaml-freestanding/commit/d50da12bca520e8b4f8b89f8d5bbb4508654632d https://github.com/palainp/mirage-xen/commit/2ec471009f246bc9ad37715d1b30c72f353f181c https://github.com/palainp/qubes-mirage-firewall/commit/1d459cde1b03564221cbddb262df555775a322f9

palainp commented 3 years ago

Update: Unfortunately, using malloc_trim() didn't change anything (the quick_stat still report low memory when stat doesn't). I'm now trying to call sbrk() instead of malloc_footprint(): https://github.com/palainp/mirage-xen/commit/fddeef92a01ade0817f481bbf20a1970549214a0 (and ocaml-freestanding no longer need to be pinned).

I actually have the same memory reporting with malloc_footprint() when the unikernel starts:

2020-11-12 06:58:15 -00:00: INF [memory_pressure] Writing meminfo: free 51MiB / 60MiB (85.90 %)

and with sbrk():

2020-11-12 14:14:19 -00:00: INF [memory_pressure] Writing meminfo: free 51MiB / 60MiB (85.89 %)
xaki23 commented 3 years ago

bit of a strange observation here. my "mirage fw test instance" is between a "vpn VM" and a "sync backup offsite" one that gets restarted hourly. this tends to be a not so bad test because it sees a lot of vm stop/starts and plenty of traffic and doesnt make me too unhappy when it breaks (the backup sync will just restart it on next cycle).

for the last 5 days, i have been running a mirage-fw with the following particulars there:

this fw has been stable for 5d now, and grepping the log there has been zero movement in the mem stats, they have all been "INF [memory_pressure] Writing meminfo: free 987MiB / 996MiB (99.16 %)". it did not even blink with different amounts of downstream VMs, and it has been pushing a lot of GBs in those days, using up about 25k sec of CPU time.

for comparison, another instance (no mem-reporting, started as dom0-supplied kernel, 64MB total) would reliably hit the OOM wall within very few hours.

not fully sure what that means yet, some options include "it didnt see any peaks at all", "the peaks are too short to be seen with 10min reporting, and stay well below 1GB", or "the OOM doesnt happen when started through multiboot".

going to crosstest with a different load profile // usecase and a different mem setting and/or bootpath in the backup role next.

xaki23 commented 3 years ago

i can rule out "something different about the used package versions" (caused by pinning solo5 to dev for multiboot or building outside docker in general): the exact same binary that is stable when run templated/grub/multiboot hits the OOM wall when running via "plain xen" path.

still no OOM when multibooted, and using it in "all roles" now.

one unrelated looking crash:

Fatal error: exception (Failure "Netif: ERROR response")
Raised at file "string.ml", line 115, characters 19-34
Called from file "src/sexp.ml", line 113, characters 13-47
Solo5: solo5_exit(2) called

no idea what that was about or what triggered it.

xbc5 commented 3 years ago

Any developments on this?

hannesm commented 3 years ago

From https://github.com/mirage/qubes-mirage-firewall/issues/120#issuecomment-728114134 I hear:

still no OOM when multibooted

Any updates on this @xaki23? The Netif error "Failure 'Netif: ERROR response'" is rather poor error handling, maybe @talex5 or @djs55 or @avsm have an idea what a good error handle should be? Maybe the netif should be removed? (Failing/exiting looks like the wrong strategy).

talex5 commented 3 years ago

"Netif: ERROR response" only appears in frontend.ml, so this means the failed connection is the one to sys-net, so recovery probably isn't possible. Though perhaps it could try reinitialising the interface.

winux138 commented 3 years ago

Here are the results of my investigation.

Under the following conditions :

I managed to get the following results :

Here are the last logs of the firewall :

2021-10-15 13:14:48 -00:00: WRN [frameQ] Maximum queue length exceeded for 10.137.0.24: dropping frame
2021-10-15 13:14:48 -00:00: WRN [frameQ] Maximum queue length exceeded for 10.137.0.24: dropping frame
2021-10-15 13:14:48 -00:00: WRN [frameQ] Maximum queue length exceeded for 10.137.0.24: dropping frame
2021-10-15 13:14:48 -00:00: WRN [frameQ] Maximum queue length exceeded for 10.137.0.24: dropping frame
2021-10-15 13:14:48 -00:00: INF [frameQ] Queue length for 10.137.0.24: decr to 9
2021-10-15 13:14:48 -00:00: INF [frameQ] Queue length for 10.137.0.24: decr to 8
2021-10-15 13:14:48 -00:00: INF [frameQ] Queue length for 10.137.0.24: decr to 7
2021-10-15 13:14:48 -00:00: INF [frameQ] Queue length for 10.137.0.24: decr to 6
2021-10-15 13:14:48 -00:00: INF [frameQ] Queue length for 10.137.0.24: decr to 5
2021-10-15 13:14:48 -00:00: INF [frameQ] Queue length for 10.137.0.24: decr to 4
2021-10-15 13:14:48 -00:00: INF [frameQ] Queue length for 10.137.0.24: decr to 3
2021-10-15 13:14:48 -00:00: INF [frameQ] Queue length for 10.137.0.24: decr to 2
2021-10-15 13:14:48 -00:00: INF [frameQ] Queue length for 10.137.0.24: decr to 1
2021-10-15 13:14:48 -00:00: INF [frameQ] Queue length for 10.137.0.24: decr to 0
2021-10-15 13:14:49 -00:00: INF [memory_pressure] Writing meminfo: free 5.9MiB / 60MiB (9.91 %)
2021-10-15 13:14:49 -00:00: WRN [firewall] Memory low - dropping packet and resetting NAT table
2021-10-15 13:14:49 -00:00: INF [memory_pressure] Writing meminfo: free 5.9MiB / 60MiB (9.91 %)
2021-10-15 13:14:49 -00:00: WRN [firewall] Memory low - dropping packet and resetting NAT table
2021-10-15 13:14:49 -00:00: INF [memory_pressure] Writing meminfo: free 5.9MiB / 60MiB (9.91 %)
2021-10-15 13:14:49 -00:00: WRN [firewall] Memory low - dropping packet and resetting NAT table
2021-10-15 13:14:49 -00:00: INF [memory_pressure] Writing meminfo: free 5.9MiB / 60MiB (9.91 %)
2021-10-15 13:14:49 -00:00: WRN [firewall] Memory low - dropping packet and resetting NAT table
2021-10-15 13:14:49 -00:00: INF [memory_pressure] Writing meminfo: free 5.9MiB / 60MiB (9.91 %)
2021-10-15 13:14:49 -00:00: WRN [firewall] Memory low - dropping packet and resetting NAT table

I managed to get these results on both the unikernel as a VM-kernel firewall and with the grub multiboot version (https://github.com/QubesOS/qubes-issues/issues/6162).

It occurs most of time within 10 minutes when used as a VM-kernel. It takes a bit more time on the grub multiboot version.

Can it be possible that CPU overloading imply some tasks on memory balance disturbing the firewall ? Does anyone knows how Xen handle packets from sys-net/AppVM to the firewall when the CPU is overloaded and how could I investigate deeper ?

winux138 commented 2 years ago

Update: Here are some news regarding this issue.

The #121 misled us because it seems that the memory suddenly drops below 10% which indicates that the unikernel is suffering from a sudden death error. In fact, the stats parameter is not updated before every meminfo prints. If we change the code to:

let report_mem_usage stats =
  Lwt.async (fun () ->
      let rec aux () =
        OS.Xs.make () >>= fun xs ->
        OS.Xs.immediate xs (fun h ->
+            let stats = OS.Memory.quick_stat () in
            OS.Xs.write h "memory/meminfo" (meminfo stats)
          ) >>= fun () ->
        OS.Time.sleep_ns (Duration.of_f 600.0) >>= fun () ->
        aux ()
      in
      aux ()
  )

We can see that the free memory slowly decreases until it reaches the 10% threshold (without sudden death issue).

In the following log, the free memory decreases (a print is made every 2") when network traffic occurs:

2022-01-06 10:12:19 -00:00: INF [memory_pressure] Writing meminfo: free 51MiB / 59MiB (84.99 %)
2022-01-06 10:12:21 -00:00: INF [memory_pressure] Writing meminfo: free 51MiB / 59MiB (84.99 %)
2022-01-06 10:12:23 -00:00: INF [memory_pressure] Writing meminfo: free 51MiB / 59MiB (84.99 %)
2022-01-06 10:12:25 -00:00: INF [memory_pressure] Writing meminfo: free 51MiB / 59MiB (84.99 %)
2022-01-06 10:12:27 -00:00: INF [memory_pressure] Writing meminfo: free 51MiB / 59MiB (84.99 %)
2022-01-06 10:12:29 -00:00: INF [memory_pressure] Writing meminfo: free 44MiB / 59MiB (73.03 %)
2022-01-06 10:12:31 -00:00: INF [memory_pressure] Writing meminfo: free 40MiB / 59MiB (66.19 %)
2022-01-06 10:12:33 -00:00: INF [memory_pressure] Writing meminfo: free 38MiB / 59MiB (64.05 %)
2022-01-06 10:12:35 -00:00: INF [memory_pressure] Writing meminfo: free 36MiB / 59MiB (60.88 %)
2022-01-06 10:12:37 -00:00: INF [memory_pressure] Writing meminfo: free 25MiB / 59MiB (41.18 %)

The current footprint function used to estimate the free memory comes from https://github.com/mirage/qubes-mirage-firewall/pull/116#issuecomment-704827905 because mallinfo() is too expensive to be called on every packet that comes to the firewall. However footprint() is based on the acquired heap size from the system and is currently never lowered (we can call trim() but this is not exposed by mirage-xen).

We have therefore tried to implement a replacement for this footprint() which will follow the malloc/free calls to give an approximation of the 'currently used memory'. The current changes can be seen in https://github.com/winux138/mirage-xen/commit/aa7bde4a83cbe52ff58f1c356cd766b98fb3012a and https://github.com/winux138/ocaml-freestanding/commit/16f7a96071219fabe1aecb59bc86929e6a1d9993

The following log shows 3 differents things:

2022-01-06 10:11:59 -00:00: INF [memory_pressure] Stat_slow meminfo: free 52MiB (54175264) / 59MiB (87.59 %)
2022-01-06 10:11:59 -00:00: INF [memory_pressure] Stat_alt meminfo: free 52MiB (54174160) / 59MiB (87.58 %)
2022-01-06 10:11:59 -00:00: INF [memory_pressure] Writing meminfo: free 51MiB (52571600) / 59MiB (84.99 %)
2022-01-06 10:12:01 -00:00: INF [memory_pressure] Stat_alt meminfo: free 52MiB (54171824) / 59MiB (87.58 %)
2022-01-06 10:12:01 -00:00: INF [memory_pressure] Writing meminfo: free 51MiB (52571600) / 59MiB (84.99 %)
2022-01-06 10:12:03 -00:00: INF [memory_pressure] Stat_alt meminfo: free 52MiB (54169440) / 59MiB (87.58 %)
2022-01-06 10:12:03 -00:00: INF [memory_pressure] Writing meminfo: free 51MiB (52571600) / 59MiB (84.99 %)
2022-01-06 10:12:05 -00:00: INF [memory_pressure] Stat_alt meminfo: free 52MiB (54167040) / 59MiB (87.57 %)
2022-01-06 10:12:05 -00:00: INF [memory_pressure] Writing meminfo: free 51MiB (52571600) / 59MiB (84.99 %)
2022-01-06 10:12:07 -00:00: INF [memory_pressure] Stat_alt meminfo: free 52MiB (54154784) / 59MiB (87.55 %)
2022-01-06 10:12:07 -00:00: INF [memory_pressure] Writing meminfo: free 51MiB (52571600) / 59MiB (84.99 %)
2022-01-06 10:12:09 -00:00: INF [memory_pressure] Stat_alt meminfo: free 52MiB (54152304) / 59MiB (87.55 %)
2022-01-06 10:12:09 -00:00: INF [memory_pressure] Writing meminfo: free 51MiB (52571600) / 59MiB (84.99 %)
2022-01-06 10:12:11 -00:00: INF [memory_pressure] Stat_alt meminfo: free 52MiB (54149168) / 59MiB (87.54 %)
2022-01-06 10:12:11 -00:00: INF [memory_pressure] Writing meminfo: free 51MiB (52571600) / 59MiB (84.99 %)
2022-01-06 10:12:13 -00:00: INF [memory_pressure] Stat_alt meminfo: free 52MiB (54141952) / 59MiB (87.53 %)
2022-01-06 10:12:13 -00:00: INF [memory_pressure] Writing meminfo: free 51MiB (52571600) / 59MiB (84.99 %)
2022-01-06 10:12:15 -00:00: INF [memory_pressure] Stat_alt meminfo: free 52MiB (54139584) / 59MiB (87.53 %)
2022-01-06 10:12:15 -00:00: INF [memory_pressure] Writing meminfo: free 51MiB (52571600) / 59MiB (84.99 %)
2022-01-06 10:12:17 -00:00: INF [memory_pressure] Stat_alt meminfo: free 52MiB (54127312) / 59MiB (87.51 %)
2022-01-06 10:12:17 -00:00: INF [memory_pressure] Writing meminfo: free 51MiB (52571600) / 59MiB (84.99 %)
2022-01-06 10:12:19 -00:00: INF [memory_pressure] Collect
2022-01-06 10:12:19 -00:00: INF [memory_pressure] Stat_slow meminfo: free 52MiB (54175264) / 59MiB (87.59 %)
2022-01-06 10:12:19 -00:00: INF [memory_pressure] Stat_alt meminfo: free 52MiB (54174160) / 59MiB (87.58 %)
2022-01-06 10:12:19 -00:00: INF [memory_pressure] Writing meminfo: free 51MiB (52571600) / 59MiB (84.99 %)
2022-01-06 10:12:21 -00:00: INF [memory_pressure] Stat_alt meminfo: free 52MiB (54171728) / 59MiB (87.58 %)
2022-01-06 10:12:21 -00:00: INF [memory_pressure] Writing meminfo: free 51MiB (52571600) / 59MiB (84.99 %)
2022-01-06 10:12:23 -00:00: INF [memory_pressure] Stat_alt meminfo: free 52MiB (54165040) / 59MiB (87.57 %)
2022-01-06 10:12:23 -00:00: INF [memory_pressure] Writing meminfo: free 51MiB (52571600) / 59MiB (84.99 %)
2022-01-06 10:12:25 -00:00: INF [memory_pressure] Stat_alt meminfo: free 52MiB (54162688) / 59MiB (87.57 %)
2022-01-06 10:12:25 -00:00: INF [memory_pressure] Writing meminfo: free 51MiB (52571600) / 59MiB (84.99 %)
2022-01-06 10:12:27 -00:00: INF [memory_pressure] Stat_alt meminfo: free 52MiB (54065536) / 59MiB (87.41 %)
2022-01-06 10:12:27 -00:00: INF [memory_pressure] Writing meminfo: free 51MiB (52571600) / 59MiB (84.99 %)
2022-01-06 10:12:29 -00:00: INF [memory_pressure] Stat_alt meminfo: free 48MiB (49556816) / 59MiB (80.12 %)
2022-01-06 10:12:29 -00:00: INF [memory_pressure] Writing meminfo: free 44MiB (45170128) / 59MiB (73.03 %)
2022-01-06 10:12:31 -00:00: INF [memory_pressure] Stat_alt meminfo: free 46MiB (47596640) / 59MiB (76.95 %)
2022-01-06 10:12:31 -00:00: INF [memory_pressure] Writing meminfo: free 40MiB (40943056) / 59MiB (66.19 %)
2022-01-06 10:12:33 -00:00: INF [memory_pressure] Stat_alt meminfo: free 45MiB (46788704) / 59MiB (75.64 %)
2022-01-06 10:12:33 -00:00: INF [memory_pressure] Writing meminfo: free 38MiB (39615952) / 59MiB (64.05 %)
2022-01-06 10:12:35 -00:00: INF [memory_pressure] Stat_alt meminfo: free 44MiB (45881520) / 59MiB (74.18 %)
2022-01-06 10:12:35 -00:00: INF [memory_pressure] Writing meminfo: free 36MiB (37658064) / 59MiB (60.88 %)
2022-01-06 10:12:37 -00:00: INF [memory_pressure] Stat_alt meminfo: free 41MiB (42171712) / 59MiB (68.18 %)
2022-01-06 10:12:37 -00:00: INF [memory_pressure] Writing meminfo: free 25MiB (25472464) / 59MiB (41.18 %)
2022-01-06 10:12:39 -00:00: INF [memory_pressure] Collect
2022-01-06 10:12:39 -00:00: INF [memory_pressure] Stat_slow meminfo: free 51MiB (52978304) / 59MiB (85.65 %)
2022-01-06 10:12:39 -00:00: INF [memory_pressure] Stat_alt meminfo: free 51MiB (52977200) / 59MiB (85.65 %)
2022-01-06 10:12:39 -00:00: INF [memory_pressure] Writing meminfo: free 31MiB (31821264) / 59MiB (51.45 %)
2022-01-06 10:12:41 -00:00: INF [memory_pressure] Stat_alt meminfo: free 49MiB (51112656) / 59MiB (82.63 %)
2022-01-06 10:12:41 -00:00: INF [memory_pressure] Writing meminfo: free 31MiB (31821264) / 59MiB (51.45 %)
2022-01-06 10:12:43 -00:00: INF [memory_pressure] Stat_alt meminfo: free 48MiB (50218128) / 59MiB (81.19 %)
2022-01-06 10:12:43 -00:00: INF [memory_pressure] Writing meminfo: free 31MiB (31821264) / 59MiB (51.45 %)
2022-01-06 10:12:45 -00:00: INF [memory_pressure] Stat_alt meminfo: free 48MiB (49546560) / 59MiB (80.10 %)
2022-01-06 10:12:45 -00:00: INF [memory_pressure] Writing meminfo: free 31MiB (31821264) / 59MiB (51.45 %)
2022-01-06 10:12:47 -00:00: INF [memory_pressure] Stat_alt meminfo: free 47MiB (48764480) / 59MiB (78.84 %)
2022-01-06 10:12:47 -00:00: INF [memory_pressure] Writing meminfo: free 31MiB (31821264) / 59MiB (51.45 %)
2022-01-06 10:12:49 -00:00: INF [memory_pressure] Stat_alt meminfo: free 46MiB (48124704) / 59MiB (77.80 %)
2022-01-06 10:12:49 -00:00: INF [memory_pressure] Writing meminfo: free 31MiB (31821264) / 59MiB (51.45 %)
2022-01-06 10:12:51 -00:00: INF [memory_pressure] Stat_alt meminfo: free 46MiB (47388880) / 59MiB (76.61 %)
2022-01-06 10:12:51 -00:00: INF [memory_pressure] Writing meminfo: free 31MiB (31821264) / 59MiB (51.45 %)

In the previous we can see that the Stat_slow (mallinfo) and the Stat_alt (alternate used memory) are almost the same, while the footprint values drops down quickly.

So far, I'm still facing an issue, at some point we overestimate the free memory (rephrase we underestimate the used memory), I must be missing an allocation in dlmalloc or counting twice some frees:

2022-01-06 10:23:19 -00:00: INF [memory_pressure] Collect
2022-01-06 10:23:19 -00:00: INF [memory_pressure] Stat_slow meminfo: free 51MiB (52978304) / 59MiB (85.65 %)
2022-01-06 10:23:19 -00:00: INF [memory_pressure] Stat_alt meminfo: free 53MiB (55400928) / 59MiB (89.57 %)
2022-01-06 10:23:19 -00:00: INF [memory_pressure] Writing meminfo: free 40MiB (40930768) / 59MiB (66.17 %)

Despite the last issue I think that this track is promising, any thought on that ?

talex5 commented 2 years ago

Nice work - and sorry about the misleading PR!

I don't have a QubesOS machine any longer, but has anyone tried @hannesm's memory profiling mentioned in https://github.com/mirage/qubes-mirage-firewall/issues/120#issuecomment-718663332? That will probably show what's taking up the space.

palainp commented 2 years ago

I can also confirm that this estimate is pretty good, I have the same free memory percentage as mallinfo on the 2 decimals places (which gives an error of at most 6kB).

I could also observe a time when the fast estimate is too optimistic compared to the real free memory value, and then it comes back to a good estimation within 20 sec:

2022-01-17 09:10:14 -00:00: INF [memory_pressure] Mallinfo meminfo: free 42MiB / 59MiB (71.17 %)
2022-01-17 09:10:14 -00:00: INF [memory_pressure] Quick_stat meminfo: free 42MiB / 59MiB (71.17 %)
2022-01-17 09:10:34 -00:00: INF [memory_pressure] Mallinfo meminfo: free 43MiB / 59MiB (72.57 %)
2022-01-17 09:10:34 -00:00: INF [memory_pressure] Quick_stat meminfo: free 43MiB / 59MiB (72.57 %)
2022-01-17 09:10:51 -00:00: INF [client_eth:dom9:10.137.0.5] who-has 10.137.0.20? responding with fe:ff:ff:ff:ff:ff
2022-01-17 09:10:54 -00:00: INF [memory_pressure] Mallinfo meminfo: free 43MiB / 59MiB (72.54 %)
2022-01-17 09:10:54 -00:00: INF [memory_pressure] Quick_stat meminfo: free 43MiB / 59MiB (72.54 %)
2022-01-17 09:11:14 -00:00: INF [memory_pressure] Mallinfo meminfo: free 43MiB / 59MiB (72.33 %)
2022-01-17 09:11:14 -00:00: INF [memory_pressure] Quick_stat meminfo: free 43MiB / 59MiB (72.64 %)
2022-01-17 09:11:19 -00:00: INF [client_eth:dom9:10.137.0.5] who-has 10.137.0.20? responding with fe:ff:ff:ff:ff:ff
2022-01-17 09:11:34 -00:00: INF [memory_pressure] Mallinfo meminfo: free 43MiB / 59MiB (72.41 %)
2022-01-17 09:11:34 -00:00: INF [memory_pressure] Quick_stat meminfo: free 43MiB / 59MiB (72.41 %)
2022-01-17 09:11:55 -00:00: INF [memory_pressure] Mallinfo meminfo: free 43MiB / 59MiB (72.41 %)
2022-01-17 09:11:55 -00:00: INF [memory_pressure] Quick_stat meminfo: free 43MiB / 59MiB (72.41 %)

I will try to monitor this alternative quick stat function on a simpler unikernel. With that I guess I will be able to use the memstat profiler.

palainp commented 2 years ago

Update (with good news): I modified the patch proposed by @winux138 a bit and I now have no delta between what mallinfo gives and what we have as a "quick estimate" of free memory. If anyone wants to test, they can use the following pins:

$ opam pin
mirage-xen.7.2.0     git  git+https://github.com/palainp/mirage-xen.git#malloc-usage
ocaml-solo5.0.8.0    git  git+https://github.com/palainp/ocaml-solo5.git#malloc-usage

And add the following modification into memory_pressure.ml:

 let status () =
-  let stats = Xen_os.Memory.quick_stat () in
-  if fraction_free stats > 0.1 then `Ok
+  let stats = Xen_os.Memory.memory_usage_stat () in
+  if fraction_free stats > 0.6 then `Ok
   else (
+    Log.info (fun f -> f "We need a collection");
     Gc.full_major ();
+    Xen_os.Memory.trim();
     let stats = Xen_os.Memory.quick_stat () in
-    report_mem_usage stats;
-    if fraction_free stats < 0.1 then `Memory_critical
+    if fraction_free stats < 0.1 then (
+      report_mem_usage stats; `Memory_critical
+    )
     else `Ok
   )

It probably will needs some tuning around the 60% and 10% values, but the idea behind that is that we now have the real memory used and even if we're a little bit below 60% of free memory it may be like guyere and Ocaml will have some troubles getting blocks large enough for some datas allocations.

When I tried to be too eager with the first fraction I ran into an issue where the heap goes really high (at 90+% of the heap+stack area) and triggers a page fault (#PF) trap. So I prefer to call the GC earlier and use trim to try and lower the heap boundary. After that, we can look at where the heap is, and it it's still too high, we can inform the hypervisor that we are running out of memory.

Using the same protocol as @winux138 and with this memory pressure policy change I never get down to 25% of free memory with quick_stat, and 50% of really free memory. I can see some "We need a collection" messages, but they're don't flood the logfile as previously and they stop when the network traffic drops, and more importantly I haven't had any "memory critical" messages.

hannesm commented 2 years ago

This is excellent news, thanks for you hard work on it @palainp. Could you open PRs for the projecs involved? Then we can review, merge, and release :)

hannesm commented 2 years ago

With mirage 4.2.0 in the pipeline, which includes @palainp @winux138 patches for ocaml-solo5 & mirage-xen, is this issue solved @palainp? It now allows us to include the trim call into the memory pressure. Could you please open a PR (or add that to #140)?