mirage / qubes-mirage-firewall

A Mirage firewall VM for QubesOS
BSD 2-Clause "Simplified" License
211 stars 28 forks source link

Qubes 4.1rc1 error - "Fatal trap" #134

Closed ArtificialAmateur closed 2 years ago

ArtificialAmateur commented 3 years ago

Building from source from the current master HEAD gives the following error when deployed and ran:

[2021-10-22 14:14:41] Solo5: Xen console: port 0x2, ring @0x00000000FEFFF000
[2021-10-22 14:14:41]             |      ___|
[2021-10-22 14:14:41]   __|  _ \  |  _ \ __ \
[2021-10-22 14:14:41] \__ \ (   | | (   |  ) |
[2021-10-22 14:14:41] ____/\___/ _|\___/____/
[2021-10-22 14:14:41] Solo5: Bindings version v0.6.7
[2021-10-22 14:14:41] Solo5: Memory map: 64 MB addressable:
[2021-10-22 14:14:41] Solo5:   reserved @ (0x0 - 0xfffff)
[2021-10-22 14:14:41] Solo5:       text @ (0x100000 - 0x2effff)
[2021-10-22 14:14:41] Solo5:     rodata @ (0x2f0000 - 0x353fff)
[2021-10-22 14:14:41] Solo5:       data @ (0x354000 - 0x4e7fff)
[2021-10-22 14:14:41] Solo5:       heap >= 0x4e8000 < stack < 0x4000000
[2021-10-22 14:14:41] 2021-10-22 18:14:41 -00:00: INF [qubes.rexec] waiting for client...
[2021-10-22 14:14:41] 2021-10-22 18:14:41 -00:00: INF [qubes.gui] waiting for client...
[2021-10-22 14:14:41] 2021-10-22 18:14:41 -00:00: INF [qubes.db] connecting to server...
[2021-10-22 14:14:41] 2021-10-22 18:14:41 -00:00: INF [qubes.db] connected
[2021-10-22 14:14:41] 2021-10-22 18:14:41 -00:00: INF [qubes.rexec] client connected, other end wants to use protocol version 3, continuing with version 2
[2021-10-22 14:14:41] Solo5: trap: type=#MF ec=0x0 rip=0x28bded rsp=0x3ffd600 rflags=0x10297
[2021-10-22 14:14:41] Solo5: ABORT: cpu_x86_64.c:181: Fatal trap

Reopening issue #128.

palainp commented 3 years ago

Can you try to run in your compilation VM objdump -D qubes_firewall.xen and then go down to28bded? My faulty address is in the fmt_fp function (you'll have to scroll up for a while). If it's the same for you, it could mean that floating printf is buggy (fmt_fp is terrific ! you can see it at https://github.com/mirage/ocaml-freestanding/blob/421bc48d1fd6f5ff04a578cfdd50b2bd1e132dc0/nolibc/vfprintf.c#L205). I have tried removing floating prints from the unikernel and it seems to work fine (it starts up fine), can you try the same?

$ git diff
diff --git a/memory_pressure.ml b/memory_pressure.ml
index cecf4a9..ea18674 100644
--- a/memory_pressure.ml
+++ b/memory_pressure.ml
@@ -16,10 +16,10 @@ let meminfo stats =
   let { OS.Memory.free_words; heap_words; _ } = stats in
   let mem_total = heap_words * wordsize_in_bytes in
   let mem_free = free_words * wordsize_in_bytes in
-  Log.info (fun f -> f "Writing meminfo: free %a / %a (%.2f %%)"
+  (*Log.info (fun f -> f "Writing meminfo: free %a / %a (%.2f %%)"
     Fmt.bi_byte_size mem_free
     Fmt.bi_byte_size mem_total
-    (fraction_free stats *. 100.0));
+    (fraction_free stats *. 100.0));*)
   Printf.sprintf "MemTotal: %d kB\n\
                   MemFree: %d kB\n\
                   Buffers: 0 kB\n\
diff --git a/unikernel.ml b/unikernel.ml
index 0621e42..2fae65e 100644
--- a/unikernel.ml
+++ b/unikernel.ml
@@ -56,7 +56,7 @@ module Main (R : Mirage_random.S)(Clock : Mirage_clock.MCLOCK) = struct
       let time_in_ns = Clock.elapsed_ns () - start_time in
       Int64.to_float time_in_ns /. 1e9
     in
-    Log.info (fun f -> f "QubesDB and qrexec agents connected in %.3f s" startup_time);
+    (*Log.info (fun f -> f "QubesDB and qrexec agents connected in %.3f s" startup_time);*)
     (* Watch for shutdown requests from Qubes *)
     let shutdown_rq =
       OS.Lifecycle.await_shutdown_request () >>= fun (`Poweroff | `Reboot) ->
ra-- commented 2 years ago

I can confirm both the error and the fix on Qubes 4.1 with qubes-mirage-firewall (current master@ef2419bf6f49c646b0a5eb3346fc7ec9555fa04f). Good catch @palainp !

palainp commented 2 years ago

@ra-- unfortunately the hardest part of the job will be figuring out why the floating point printf fails now and to fixing the ocaml-freestanding side. I'm currently still on 4.0.4 (EDIT: not 3.x) Qubes OS and can't test how to fix that now :(

palainp commented 2 years ago

Update: So I installed Qubes 4.1 and I now have a really strange observation here. If I try to produce a "minimal bug unikernel", starting with mirage-skeleton/hello_world I can have these results:

$ cat unikernel.ml
open Lwt.Infix

module Hello (Time : Mirage_time.S) = struct
  let start _time =

    let rec loop = function
      | 0 -> Lwt.return_unit
      | n ->
        let stats = OS.Memory.quick_stat () in
        let { OS.Memory.free_words; heap_words; _ } = stats in
        let value = free_words in
        Logs.info (fun f -> f "printing A %a" Fmt.byte_size value);
        Logs.info (fun f -> f "printing B %a" Fmt.bi_byte_size value);
        Time.sleep_ns (Duration.of_sec 1) >>= fun () ->
        loop (n-1)
    in
    loop 4

end

And in the dom0 console logs:

[2022-02-23 12:34:30] Logfile Opened
[2022-02-23 12:34:31] Solo5: Xen console: port 0x2, ring @0x00000000FEFFF000
[2022-02-23 12:34:31]             |      ___|
[2022-02-23 12:34:31]   __|  _ \  |  _ \ __ \
[2022-02-23 12:34:31] \__ \ (   | | (   |  ) |
[2022-02-23 12:34:31] ____/\___/ _|\___/____/
[2022-02-23 12:34:31] Solo5: Bindings version v0.6.9
[2022-02-23 12:34:31] Solo5: Memory map: 64 MB addressable:
[2022-02-23 12:34:31] Solo5:   reserved @ (0x0 - 0xfffff)
[2022-02-23 12:34:31] Solo5:       text @ (0x100000 - 0x1f6fff)
[2022-02-23 12:34:31] Solo5:     rodata @ (0x1f7000 - 0x231fff)
[2022-02-23 12:34:31] Solo5:       data @ (0x232000 - 0x2fbfff)
[2022-02-23 12:34:31] Solo5:       heap >= 0x2fc000 < stack < 0x4000000
[2022-02-23 12:34:31] 2022-02-23 17:34:31 -00:00: INF [application] printing A 7.28MB
[2022-02-23 12:34:31] Solo5: trap: type=#MF ec=0x0 rip=0x19b356 rsp=0x3ffdc00 rflags=0x10216
[2022-02-23 12:34:31] Solo5: ABORT: cpu_x86_64.c:181: Fatal trap

So the fatal trap is also hidden into the binary pretty print formatter (which do manipulation with floats)... And what is also troubling me is that:

SurFlurer commented 2 years ago

I have to report that the last build on https://builds.robur.coop/job/qubes-firewall/ is still suffering from this issue. If I don't build it myself, should I just wait for a few days?

palainp commented 2 years ago

Thanks for your reporting !

The PR fixing this issue has been integrated in version 0.8.0 of ocaml-solo5 which is only usable with mirage 4. As mirage 4 has just been released a few days ago, I guess it will take a bit before it can be used in automatic build systems.

hannesm commented 2 years ago

Good news, since the most recent build today (https://builds.robur.coop/job/qubes-firewall/build/2b271b28-0cb5-4566-9350-18d6d7c29119/ and later) this issue should be fixed in the binary builds as well @njk2002

jorgejones7711 commented 2 years ago

Will there be a new release soon? Would love to use it on Qubes 4.1!

hannesm commented 2 years ago

Yes, we'll soon have releases, esp. now that lots of outstanding issues have been cleared (PRs and merges pending). Expect a new release within the next two weeks :)