Closed xaki23 closed 2 years ago
This is quite interesting, thanks for reporting! Do you think it is possible to reproduce this while logging the packets before it happens? How long did it take until it happened?
@xaki23 thanks for your report. To investigate, packet captures would be of interest for us. Could you use the firewall branch git+https://github.com/roburio/qubes-mirage-firewall.git#more-debug-for-xaki which outputs a hexdump if the decoding fails. Thanks.
@linse it was running only for about 6-7 hours at the time that happened, there had been about 15 downstream vm starts during that time. unless i find a reliable way to repro this the approach of having the FW hexdump things seems better since i would have to log packets on "all" involved vms otherwise, which ... isnt easy.
@hannes build based on more-debug-for-xaki running in backup role now, will deploy in main tomorrow.
thanks to both of you!
just happened again, right after starting a downstream VM. difference to the first observed case was that this time it didnt keep running for long.
the packets are mostly null bytes, they seem to have reasonable sizes: guest-sys-mfwt.log
grepped out the "lines of just null bytes hexdump" from this one, which makes it easier to see the few-and-far-between non-null-bytes, there does not seem to be any useful information content in these: guest-sys-mfwt-nonullz.log
and i had another crash right after vm start, but that ... just stopped, with no useful information in the log at all.
Is this still an issue with more recent builds @xaki23?
not seen this recently, closing
i switched my main mfwt to a 20200520 build today (which has been running "ok afaict" in the backup role since then) and at some point after starting a new VM it went ... bad.
there was no traffic going through the mfw anymore, only this ethertype scrolling for minutes (until i pulled the plug). this applied to all VMs, even ones that were already running and working for the last hours. the reported ethertype is mostly 0x0, but lots of random-ish looking other values too.
there is nothing else i perceive as unusual in that log, the setup/role is working fine with a 20200509 build for days (up to the usual "63 downstream vms seen" crash, aka #35 )
perhaps this is another variant of #105 ?!