Qemu Adapters Not Processing Subnet-Local Traffic Correctly

sempervictus commented 3 years ago

Important notices

Before you add a new report, we ask you kindly to acknowledge the following:

[ X ] I have read the contributing guide lines at https://github.com/opnsense/core/blob/master/CONTRIBUTING.md
[ X ] I am convinced that my issue is new after having checked both open and closed issues at https://github.com/opnsense/core/issues?q=is%3Aissue

Describe the bug

Traffic routed over an opnsense instance to a windows VM from a remote subnet works correctly (and visa versa), traffic between opnsense and the windows instance in its subnet however does not. Pinging from the windows host shows the inbound ICMP packets on tcpdump inside the firewall - the traffic makes it to the iface but then there's no response. ARP tables are updated (obviously or routing would fail), just no L3 response whatsoever. The NIC types on both VMs are virtio and the underlying host kernels are 5.10. We're seeing this inside openstack, even with all port-security functions disabled (if port security were the issue, we wouldn't be seeing those inbound packets). This has apparently been going on for some time, probably since around the 21.7 release. Changing hardware offload options does nothing.

To Reproduce

Steps to reproduce the behavior:

Create two libvirt/KVM VMs with virtio NICs
Put the nics in the same subnet (in our case with MTU 9k assigned as well, though this happens at 1500 too)
Permit IP traffic between the VMs (at all possible tiers of control including the underlying virtual network and the fw itself)
Try to ping the VMs
Observe ICMP inbound and ARP table updates, but no response sent.

Expected behavior

L3 works in subnet

Environment

Software version used and hardware type if relevant, e.g.:

OPNsense 21.7.4-amd64 FreeBSD 12.1-RELEASE-p20-HBSD OpenSSL 1.1.1l 24 Aug 2021 Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz (2 cores) VirtIO NIC (kernel 5.10.x)

sempervictus commented 3 years ago

@fichtner - thanks, sorry for the incorrect placement. Spooky bug, guessing that its driver-level given the symptoms observed, but might be something in the BSD networking tier (i am no such BSD/PF guru) itself as relating to some malformation via the virtual adapter up to the packet processing level or an interface problem of some sort. The fact that i can tcpdump the inbound packets is just that much more confusing - clearly they're well-formed-enough to be parsed and written to STDOUT. However, not seeing any denies in the firewall log relating to these packets (or allow actions when forcing logging to be enabled for all actions), despite them being seen by the adapter as L2 frames which tcpdump reassembles into L3+ packets without throwing errors

fichtner commented 3 years ago

Can you wait for 13-based beta? It‘s only two more weeks I think.

sempervictus commented 3 years ago

'Course - only real breakage is RADIUS can't work like this, most of the traffic they handle is routed.

sempervictus commented 3 years ago

To make this so much more messed up: seeing this on one of the older openstack instances, on only one of the interfaces facing internal networks - all interfaces show ARP entries, but only one can actually communicate with neighbors.

sempervictus commented 3 years ago

Another fun bit: the GUI-based "HW offload" selectors don't work as the hw.vtnet sysctls are separate and need to be manually created in the tunables section:

# sysctl -a | grep hw.vtnet
hw.vtnet.rx_process_limit: 512
hw.vtnet.mq_max_pairs: 8
hw.vtnet.mq_disable: 1
hw.vtnet.lro_disable: 1
hw.vtnet.tso_disable: 1
hw.vtnet.csum_disable: 1

Unfortunately, this too has no effect. One instance accepts local traffic but will not route/nat it out, the other instance routes just fine but wont accept local traffic. There's no good way to change interface types on-the-fly in a cloud environment (at least in OpenStack). This might actually be a bigger problem than i thought... @fichtner - any thoughts on potential workarounds we could use while the BSD13 beta is cooked up?

sempervictus commented 3 years ago

Found some suggestions to boot the VM as a q35 instance, but that's not flying too well - bootloader failure and multiboot can't find a usable volume (nano image being used in the private cloud - need to figure out how to build proper cloud images on ZFS instead), so had to go back to the default pc type.

sempervictus commented 3 years ago

@AdSchellevis, @fichtner - i think i've found the upstream bug in freebsd. They talk about disabling csum, which i've done inside the FWs themselves in various ways. However, the piece about doing it at the hypervisor level won't fly, even for us where we control the entire cloud. The way Nova works is that it regenerates the libvirt XML for a domain (instance/machine/whatever) on an ongoing basis. This is also why swapping out the NICs to e1000's won't fly - unless there's a Nova driver interface to define those parameters, it'll discard them on XML-regen. I can, for now change the database-assigned NIC type which will make Nova generate an e1000 for it when it boots next, but these things sit atop an 80Gbit fabric with a raw 10Gbit WAN uplink and the e1000 won't get anywhere near those speeds. I have not yet tracked down any patches for this, but i'm somewhat of an out-of-towner in BSD-world (pun intended), so its kind of a slow process. If i do find any such patches, do those go here as PRs or how is that handled?

fichtner commented 3 years ago

@sempervictus if you find patches in any BSD that might be related that would help. It's sort of tricky to "find" "the issue" in the code without any reference to code itself. The workarounds are probably just that. I wouldn't be surprised if a larger blob of code was missing in FreeBSD to glue this in correctly.

sempervictus commented 2 years ago

@fichtner: looks like the betas are up now - whats the process for testing off the beta/development channel on a system deployed from community?

fichtner commented 2 years ago

Changed to development, check and install. Check and install again following the major upgrade afterwards. It will keep asking you to do a major upgrade, but you only need to do it once.

vherrlein commented 1 year ago

I confirm a similar issue with OPNsense 22.7.9_3-amd64 based on FreeBSD 13.1-RELEASE-p5 by using QEmu and q35 virtio nic. Everything works fine with QEmu i440 engine.

Steps to reproduce:

Make a VM under KVM with Q35 Chipset and a nic with virtio
Bridge the virtual network to a physical nic attached to a switch or directly attached to an access point (ensure the AP is attached to the switch)
By WiFi from a phone (iOS / Android) access first time to OPNsense dashboard
Drop the phone till it’s wifi connection goes in power sleep mode (5 to 15 minutes)
In the meantime, tcpdump all incoming and outgoing traffic for the phone MAC address from the host nic and the guest nic. Check also the arp table to find the phone MAC address.
Try again to access the OPNsense dashboard, then you will see in tcpdump communications in between phone and OPNsense but nothing received in the phone UNTILL the phone make an ARP request for the IP of OPNsense.

fichtner commented 10 months ago

Closing stale issue. Not much we can do.

opnsense / src

Qemu Adapters Not Processing Subnet-Local Traffic Correctly #135