going subfunctional with "unknown ethertype"

xaki23 commented 4 years ago

i switched my main mfwt to a 20200520 build today (which has been running "ok afaict" in the backup role since then) and at some point after starting a new VM it went ... bad.

2020-06-09 22:27:25 -00:00: INF [client_net] Client 23 (IP: 10.137.0.20) ready
2020-06-09 22:27:25 -00:00: INF [ethernet] Connected Ethernet interface fe:ff:ff:ff:ff:ff
2020-06-09 22:27:25 -00:00: WRN [client_net] Invalid Ethernet frame: unknown ethertype 0x0 in frame
2020-06-09 22:27:25 -00:00: INF [qubes.db] got rm "/qubes-firewall/10.137.0.20/"
2020-06-09 22:27:25 -00:00: INF [qubes.db] got update: "/qubes-firewall/10.137.0.20/policy" = "drop"
2020-06-09 22:27:25 -00:00: INF [qubes.db] got update: "/qubes-firewall/10.137.0.20/0000" = "action=accept"
2020-06-09 22:27:25 -00:00: INF [qubes.db] got update: "/qubes-firewall/10.137.0.20" = ""
2020-06-09 22:27:25 -00:00: WRN [client_net] Invalid Ethernet frame: unknown ethertype 0x3000 in frame
2020-06-09 22:27:25 -00:00: WRN [client_net] Invalid Ethernet frame: unknown ethertype 0xffff in frame
2020-06-09 22:27:26 -00:00: WRN [client_net] Invalid Ethernet frame: unknown ethertype 0x0 in frame
2020-06-09 22:27:26 -00:00: WRN [client_net] Invalid Ethernet frame: unknown ethertype 0xdd1f in frame
2020-06-09 22:27:26 -00:00: WRN [client_net] Invalid Ethernet frame: unknown ethertype 0x0 in frame
2020-06-09 22:27:27 -00:00: WRN [client_net] Invalid Ethernet frame: unknown ethertype 0xffff in frame
2020-06-09 22:27:27 -00:00: WRN [client_net] Invalid Ethernet frame: unknown ethertype 0x0 in frame
2020-06-09 22:27:27 -00:00: WRN [client_net] Invalid Ethernet frame: unknown ethertype 0xdd1f in frame
2020-06-09 22:27:27 -00:00: WRN [client_net] Invalid Ethernet frame: unknown ethertype 0x0 in frame
2020-06-09 22:27:27 -00:00: WRN [client_net] Invalid Ethernet frame: unknown ethertype 0x0 in frame
2020-06-09 22:27:27 -00:00: WRN [client_net] Invalid Ethernet frame: unknown ethertype 0x0 in frame
2020-06-09 22:27:27 -00:00: WRN [client_net] Invalid Ethernet frame: unknown ethertype 0x0 in frame
2020-06-09 22:27:27 -00:00: WRN [client_net] Invalid Ethernet frame: unknown ethertype 0x0 in frame
2020-06-09 22:27:28 -00:00: WRN [client_net] Invalid Ethernet frame: unknown ethertype 0x0 in frame
2020-06-09 22:27:28 -00:00: WRN [client_net] Invalid Ethernet frame: unknown ethertype 0x0 in frame
2020-06-09 22:27:28 -00:00: WRN [client_net] Invalid Ethernet frame: unknown ethertype 0x0 in frame
2020-06-09 22:27:28 -00:00: WRN [client_net] Invalid Ethernet frame: unknown ethertype 0x0 in frame
2020-06-09 22:27:28 -00:00: WRN [client_net] Invalid Ethernet frame: unknown ethertype 0x0 in frame

there was no traffic going through the mfw anymore, only this ethertype scrolling for minutes (until i pulled the plug). this applied to all VMs, even ones that were already running and working for the last hours. the reported ethertype is mostly 0x0, but lots of random-ish looking other values too.

there is nothing else i perceive as unusual in that log, the setup/role is working fine with a 20200509 build for days (up to the usual "63 downstream vms seen" crash, aka #35 )

perhaps this is another variant of #105 ?!

linse commented 4 years ago

This is quite interesting, thanks for reporting! Do you think it is possible to reproduce this while logging the packets before it happens? How long did it take until it happened?

hannesm commented 4 years ago

@xaki23 thanks for your report. To investigate, packet captures would be of interest for us. Could you use the firewall branch git+https://github.com/roburio/qubes-mirage-firewall.git#more-debug-for-xaki which outputs a hexdump if the decoding fails. Thanks.

xaki23 commented 4 years ago

@linse it was running only for about 6-7 hours at the time that happened, there had been about 15 downstream vm starts during that time. unless i find a reliable way to repro this the approach of having the FW hexdump things seems better since i would have to log packets on "all" involved vms otherwise, which ... isnt easy.

@hannes build based on more-debug-for-xaki running in backup role now, will deploy in main tomorrow.

thanks to both of you!

xaki23 commented 4 years ago

just happened again, right after starting a downstream VM. difference to the first observed case was that this time it didnt keep running for long.

the packets are mostly null bytes, they seem to have reasonable sizes: guest-sys-mfwt.log

grepped out the "lines of just null bytes hexdump" from this one, which makes it easier to see the few-and-far-between non-null-bytes, there does not seem to be any useful information content in these: guest-sys-mfwt-nonullz.log

and i had another crash right after vm start, but that ... just stopped, with no useful information in the log at all.

hannesm commented 2 years ago

Is this still an issue with more recent builds @xaki23?

xaki23 commented 2 years ago

not seen this recently, closing

mirage / qubes-mirage-firewall

going subfunctional with "unknown ethertype" #108