Closed burghardt closed 1 year ago
Thanks for your report. Would you mind to clarify which qubes firewall image you used? Is it 0.8.2? Since in the latest, 0.8.3 (sha256 f499b2379c62917ac32854be63f201e6b90466e645e54dea51e376baccdf26ab), there are some fixes in these parts.
Thanks a lot!
From reading some source code (sorry don't have my QubesOS laptop nearby), the destination address in the above mentioned IPv4 is in the multicast admin range (thus Ipaddr.scope returns Admin
), and mirage-nat checks in nat_rewrite.ml function add whether both source and destination are in Global or Organization. If not, the `Cannot_NAT
error is returned.
Now, from the terminal output it looks like we're seeing output from firewall.ml in function add_nat_and_forward_ipv4, which before 0.7.1 used pp_header, and now Nat_packet.pp (https://github.com/mirage/qubes-mirage-firewall/commit/87df5bdcc015b1a9f06aeeadcb8a283e3b1fe100#diff-ec9fe4e557558e9f9cb06c4011300f8bdf4fa73809d7202f11d2a0119b34dff9L118) (which shouldn't make a difference). There's some broken log output (after the first dump, a closing parenthesis is present )
, but afterwards some more hexdump is present, and the next log messages are indented).
Would you mind to test with some smaller packets (atm the UDP payload is 607 bytes, does it also fail with fewer bytes?) - it may be related to fragmentation and reassembly (though MTU should be ~1500).
Just a note, mdns is the link local multicast range, and should not be natted or forwarded between IP two networks.
On Sun, 4 Dec 2022 at 11:43, Hannes Mehnert @.***> wrote:
From reading some source code (sorry don't have my QubesOS laptop nearby), the destination address in the above mentioned IPv4 is in the multicast admin range (thus Ipaddr.scope returns Admin), and mirage-nat checks in nat_rewrite.ml function add whether both source and destination are in Global or Organization. If not, the `Cannot_NAT error is returned.
Now, from the terminal output it looks like we're seeing output from firewall.ml in function add_nat_and_forward_ipv4, which before 0.7.1 used pp_header, and now Nat_packet.pp (87df5bd
diff-ec9fe4e557558e9f9cb06c4011300f8bdf4fa73809d7202f11d2a0119b34dff9L118
https://github.com/mirage/qubes-mirage-firewall/commit/87df5bdcc015b1a9f06aeeadcb8a283e3b1fe100#diff-ec9fe4e557558e9f9cb06c4011300f8bdf4fa73809d7202f11d2a0119b34dff9L118) (which shouldn't make a difference). There's some broken log output (after the first dump, a closing parenthesis is present ), but afterwards some more hexdump is present, and the next log messages are indented).
Would you mind to test with some smaller packets (atm the UDP payload is 607 bytes, does it also fail with fewer bytes?) - it may be related to fragmentation and reassembly (though MTU should be ~1500).
— Reply to this email directly, view it on GitHub https://github.com/mirage/qubes-mirage-firewall/issues/166#issuecomment-1336374943, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABR2EFCKA7WHRPB3H457SLWLRYVNANCNFSM6AAAAAASTCCVRI . You are receiving this because you are subscribed to this thread.Message ID: @.***>
I can confirm that the python code causes DoS on 0.8.3 too.
I just tried filtering multicast destinations (all 224.0.0.0/4 range, both client and netvm) in https://github.com/palainp/qubes-mirage-firewall/tree/fix-dos. This patch mitigates the proposed DoS. I'll see if there are any side effects before PR it.
@palainp interesting, but the mirage-nat in nat_rewrite already checks the multicast (scope
, as mentioned above, may need some adjustments). I think we need to figure out why the log message being printed is never-ending/repeating if `Cannot_NAT
is returned from Nat.add
(in my_nat.ml)?
Update: I had problems when I tried to do a pretty_print of the guilty packet. I then tried @hannesm's path (atm UDP payload is 607 bytes, does it fail with less bytes?) and the patch at https://github.com/palainp/mirage-nat/tree/pp-limit-payload which limits the pretty print to 10 bytes does not suffer from DoS behaviour.
I'm not sure how a very long Cstruct.hexdump_pp can produce this kind of infinite loop, and it might be useful to print more than that with a maximum of 10B, so my patch may be too limiting here :(
Discussing with @palainp, what changed between 0.7 and 0.8 is mainly PV -> PVH (and mini-os -> solo5). Now, the code printing log messages is from solo5 (in bindings/xen/console.c
) -- which in contrast to mini-os doesn't do any memory_barrier
(the mirage-console code does calls to memory barriers -- I'm not sure whether they're needed) [the call chain is via ocaml-solo5's nolibc that defines in sysdeps_solo5
the function write
(that the OCaml runtime uses for Printf.fprintf stderr
(what the mirage-logs reporter calls)) which uses solo5_console_write
-- which is defined in solo5 bindings/xen/platform.c
to call platform_puts
which calls console_write
].
So maybe to reduce the test case we should investigate whether super-long log messages on xen are an issue (100% cpu load), and figure out how to fix this (by looking deeply into solo5 bindings/xen/console.c
).
@hannesm you're right, printing a log longer than 440B (EDIT: more than 2048B, with every payload byte as 3 printed characters: 2 nibbles + 1 space, and a small count of bytes for the timestamp) will loop (https://github.com/Solo5/solo5/issues/537 for further investigating).
I used an mDNS fuzzer over the Mirage firewall and it run into problems resulting in DoS (99% CPU usage, stopped forwarding packets for all Qubes attached to the firewall instance).
Scapy output from fuzzer is quite verbose, but the minimalistic PoC is very simple.
Here is Scapy PoC (minimalized by removing setup of unrelated fields):
I translated this into BSD socket API to void the need for Scapy framework (and running PoC as
root
):The test setup was: [Qube running PoC] -> [Mirage firewall] -> [Net Qube]
Tested Mirage firewall versions: v0.7.1 - ok v0.8.x - vulnerable
Version v0.7.1 prints this into the console while processing the packet:
And the output form v0.8.x loops printing packet details forever:
This issue seems to be unrelated to #158, as this happens with the following ruleset:
PoC demo on YouTube: