opnsense / core

OPNsense GUI, API and systems backend
https://opnsense.org/
BSD 2-Clause "Simplified" License
3.17k stars 713 forks source link

HTTP/TCP packets originating from Opnsense towards LAN interfaces seem to disappear #7601

Open morikplay opened 1 month ago

morikplay commented 1 month ago

Important notices

Before you add a new report, we ask you kindly to acknowledge the following:

Describe the bug

I posted the issue here but have yet to receive a response. Hence, this report. Please accept my apologies for potentially increasing the workload of opnsense's awesome team!

Upgraded from 23.x (latest release for business) to 24.4.1. Still on ISC DHCP with only IPv4; KEA not yet enabled. Intra- and Inter- VLAN packet routing works like a charm. No configuration or rule changes were made since. However, any outbound TCP (or sometimes UDP) packets in FW-->(any)VLAN direction on standard (e.g. 80, 443) or non-standard (e.g. 8080) ports do not get processed. ICMP pings to the same LAN hosts are fine. All other intra-VLAN nodes, inter-VLAN nodes, and even external nodes (e.g. web client on a cellular connection) can reach the address:ports in question. Just not the packets originating from FW. Example of such packets are telegraf metrics, crowdsec-lapi requests, or even simple requests from ssh/console like curl -vi --connect-timeout n <url> timeout.

Reverting to previous environment restores normalcy.

Steps to demonstrate issue

  1. 3 hosts in the mix here to illustrate the issue. a) Opnsense FW b) homeassistant (192.168.0.58) c) portainer (192.168.100.6)
  2. Check reachability of hosts on VLAN 3 (opt7, main, IP range 192.168.0.0/23) and then VLAN 100 (opt6, IP range 192.168.100.0/24) from Opnsense FW.
    
    ping 192.168.0.58
    PING 192.168.0.58 (192.168.0.58): 56 data bytes
    64 bytes from 192.168.0.58: icmp_seq=0 ttl=64 time=0.409 ms
    ...

ping 192.168.100.6 PING 192.168.100.6 (192.168.100.6): 56 data bytes 64 bytes from 192.168.100.6: icmp_seq=0 ttl=64 time=0.369 ms ..


3. Check routing towards the same hosts from Opnsense FW.

traceroute 192.168.0.58 traceroute to 192.168.0.58 (192.168.0.58), 64 hops max, 40 byte packets 1 homeassistant (192.168.0.58) 0.464 ms 0.167 ms 0.175 ms ... traceroute 192.168.100.6 traceroute to 192.168.100.6 (192.168.100.6), 64 hops max, 40 byte packets 1 portainer (192.168.100.6) 0.359 ms 0.133 ms 0.241 ms


4. Check for port `443` exposure and attempt a simple `curl` request from one host to another - to test inter-VLAN reachability at application level. It works as intended.

@portainer:~$ nc -4nzvw 5 192.168.0.58 443 Connection to 192.168.0.58 443 port [tcp/*] succeeded! @portainer:~$ nc -4nuzvw 5 192.168.0.58 443 (base) maumau@portainer:~$

@portainer:~$ curl -ki --connect-timeout 5 https://homeassistant.esco.ghaar HTTP/2 200 server: nginx date: Sat, 06 Jul 2024 18:48:24 GMT content-type: text/html; charset=utf-8 content-length: 4148 referrer-policy: no-referrer x-content-type-options: nosniff x-frame-options: SAMEORIGIN strict-transport-security: max-age=31536000; includeSubDomains ... (truncated the full HTTP response)

5. Now, execute step 3 by replacing one of the hosts with Opnsense firewall

nc -4nzvw 5 192.168.0.58 443 nc: connect to 192.168.0.58 port 443 (tcp) failed: Operation timed out nc -4nuzvw 5 192.168.0.58 443 Connection to 192.168.0.58 443 port [udp/*] succeeded! ...

...

curl -kvi --connect-timeout 5 https://homeassistant.esco.ghaar

Expected behavior

Opnsense should establish HTTPS/TCP connection, or even basic TCP connection to hosts on various VLANs.

Describe alternatives you considered

Please refer to above. Reverting to Opnsense 23.x (last known business version) restores normalcy.

Screenshots

If applicable, add screenshots to help explain your problem.

Relevant log files

If applicable, information from log files supporting your claim.

Additional context

Few additional considerations which i grappled with:

  1. Does this issue always happen? Mostly yes, but every so often (once in a few hours or so) a TCP request in FW-->Host direction slips through. Evidence:

    #nc -4znvw 10 192.168.0.58 443
    Connection to 192.168.0.58 443 port [tcp/*] succeeded!
    <!-- immediately following which another series of requests fail -->
    ...
    #nc -4znvw 10 192.168.0.58 443
    nc: connect to 192.168.0.58 port 443 (tcp) failed: Operation timed out
    # nc -4znvw 10 192.168.0.58 443
    nc: connect to 192.168.0.58 port 443 (tcp) failed: Operation timed out
  2. What is going on at TCP/IPv4 level? a. Are the FW originating HTTP(s) over TCP packets sent over the wire? Yes b. If so, is the switch network eating it up? (due to say bad VLAN configuration No c. Is the receiving host not responding at TCP level? No. Receiving host does issue TCP SYN ACKs d. Are receiving hosts packets blocked by FW rules? Answer: No e. Are receiving hosts packets received at the FW interface? Answer: Yes

Packet capture from opt6 (VLAN 100) to illustrate packet behavior can be seen below. Same behavior for main/opt7 interface is also observed. :

Servers
vlan0.100   2024-06-28
07:37:50.442037 f4:90:ea:00:9f:72   00:50:56:82:d8:b4   ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.100.1.31315 > 192.168.100.21.8080: Flags [S], cksum 0x8070 (correct), seq 445912424, win 65535, options [mss 8960,nop,wscale 12,sackOK,TS val 1292126707 ecr 0], length 0
Servers
vlan0.100   2024-06-28
07:37:50.442400 00:50:56:82:d8:b4   f4:90:ea:00:9f:72   ethertype IPv4 (0x0800), length 74: (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.100.21.8080 > 192.168.100.1.31315: Flags [S.], cksum 0xe967 (correct), seq 3873949677, ack 445912425, win 43440, options [mss 1460,sackOK,TS val 3838080763 ecr 1292126707,nop,wscale 9], length 0
Servers
vlan0.100   2024-06-28
07:37:51.442697 f4:90:ea:00:9f:72   00:50:56:82:d8:b4   ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.100.1.31315 > 192.168.100.21.8080: Flags [S], cksum 0x7c87 (correct), seq 445912424, win 65535, options [mss 8960,nop,wscale 12,sackOK,TS val 1292127708 ecr 0], length 0
Servers
vlan0.100   2024-06-28
07:37:51.443231 00:50:56:82:d8:b4   f4:90:ea:00:9f:72   ethertype IPv4 (0x0800), length 74: (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.100.21.8080 > 192.168.100.1.31315: Flags [S.], cksum 0xe57e (correct), seq 3873949677, ack 445912425, win 43440, options [mss 1460,sackOK,TS val 3838081764 ecr 1292126707,nop,wscale 9], length 0
Servers
vlan0.100   2024-06-28
07:37:52.462713 00:50:56:82:d8:b4   f4:90:ea:00:9f:72   ethertype IPv4 (0x0800), length 74: (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.100.21.8080 > 192.168.100.1.31315: Flags [S.], cksum 0xe182 (correct), seq 3873949677, ack 445912425, win 43440, options [mss 1460,sackOK,TS val 3838082784 ecr 1292126707,nop,wscale 9], length 0
Servers
vlan0.100   2024-06-28
07:37:53.642675 f4:90:ea:00:9f:72   00:50:56:82:d8:b4   ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.100.1.31315 > 192.168.100.21.8080: Flags [S], cksum 0x73ef (correct), seq 445912424, win 65535, options [mss 8960,nop,wscale 12,sackOK,TS val 1292129908 ecr 0], length 0
Servers
vlan0.100   2024-06-28
07:37:53.643161 00:50:56:82:d8:b4   f4:90:ea:00:9f:72   ethertype IPv4 (0x0800), length 74: (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.100.21.8080 > 192.168.100.1.31315: Flags [S.], cksum 0xdce6 (correct), seq 3873949677, ack 445912425, win 43440, options [mss 1460,sackOK,TS val 3838083964 ecr 1292126707,nop,wscale 9], length 0
Servers
vlan0.100   2024-06-28
07:37:55.662758 00:50:56:82:d8:b4   f4:90:ea:00:9f:72   ethertype IPv4 (0x0800), length 74: (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.100.21.8080 > 192.168.100.1.31315: Flags [S.], cksum 0xd502 (correct), seq 3873949677, ack 445912425, win 43440, options [mss 1460,sackOK,TS val 3838085984 ecr 1292126707,nop,wscale 9], length 0
Servers
vlan0.100   2024-06-28
07:37:57.842474 f4:90:ea:00:9f:72   00:50:56:82:d8:b4   ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.100.1.31315 > 192.168.100.21.8080: Flags [S], cksum 0x6387 (correct), seq 445912424, win 65535, options [mss 8960,nop,wscale 12,sackOK,TS val 1292134108 ecr 0], length 0
Servers
vlan0.100   2024-06-28
07:37:57.842885 00:50:56:82:d8:b4   f4:90:ea:00:9f:72   ethertype IPv4 (0x0800), length 74: (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.100.21.8080 > 192.168.100.1.31315: Flags [S.], cksum 0xcc7e (correct), seq 3873949677, ack 445912425, win 43440, options [mss 1460,sackOK,TS val 3838088164 ecr 1292126707,nop,wscale 9], length 0
Servers
vlan0.100   2024-06-28
07:38:01.966765 00:50:56:82:d8:b4   f4:90:ea:00:9f:72   ethertype IPv4 (0x0800), length 74: (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.100.21.8080 > 192.168.100.1.31315: Flags [S.], cksum 0xbc62 (correct), seq 3873949677, ack 445912425, win 43440, options [mss 1460,sackOK,TS val 3838092288 ecr 1292126707,nop,wscale 9], length 0
  1. Is there an MTU mismatch issue here? Step /2/ shows FW with mtu:9000 (downsampled at ether layer) with host mtu:1500 (downsampled at ether layer). Changing host mtu to be same as FW results in same behavior. Please see attached tcpdump.zip which includes pcap and json.

tcpdump.zip

  1. Could there be an intermittent issue w/ network drivers for the ice card involved? Doesn't seem so, Also, if that were to be the case then it should manifest across all VLANs? There are certain oddities about ice messages but these seemed also to be there in 23.x (if memory serves me right).
    Copyright (c) 1992-2021 The FreeBSD Project.
    Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
    The Regents of the University of California. All rights reserved.
    FreeBSD is a registered trademark of The FreeBSD Foundation.
    FreeBSD 13.2-RELEASE-p11 stable/24.1-n255023-99a14409566 SMP amd64
    FreeBSD clang version 14.0.5 (https://github.com/llvm/llvm-project.git llvmorg-14.0.5-0-gc12386ae247c)
    VT(vga): resolution 640x480
    CPU: AMD EPYC 3251 8-Core Processor                  (2495.44-MHz K8-class CPU)
    Origin="AuthenticAMD"  Id=0x800f12  Family=0x17  Model=0x1  Stepping=2
    Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
    Features2=0x7ed8320b<SSE3,PCLMULQDQ,MON,SSSE3,FMA,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND>
    AMD Features=0x2e500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM>
    AMD Features2=0x35c233ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,SKINIT,WDT,TCE,Topology,PCXC,PNXC,DBE,PL2I,MWAITX>
    Structured Extended Features=0x209c01a9<FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA>
    XSAVE Features=0xf<XSAVEOPT,XSAVEC,XINUSE,XSAVES>
    AMD Extended Feature Extensions ID EBX=0x1007<CLZERO,IRPerf,XSaveErPtr,IBPB>
    SVM: NP,NRIP,VClean,AFlush,DAssist,NAsids=32768
    TSC: P-state invariant, performance statistics
    real memory  = 68717379584 (65534 MB)
    avail memory = 66675605504 (63586 MB)
    Event timer "LAPIC" quality 600
    ACPI APIC Table: <INSYDE WALLABY>
    FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
    FreeBSD/SMP: 1 package(s) x 2 cache groups x 4 core(s) x 2 hardware threads
    FreeBSD/SMP Online: 1 package(s) x 2 cache groups x 4 core(s)
    random: registering fast source Intel Secure Key RNG
    random: fast provider: "Intel Secure Key RNG"
    random: unblocking device.
    ioapic0: MADT APIC ID 128 != hw id 0
    ioapic1: MADT APIC ID 129 != hw id 0
    ioapic0 <Version 2.1> irqs 0-23
    ioapic1 <Version 2.1> irqs 24-55
    Launching APs: 7 5 1 3 2 6 4
    random: entropy device external interface
    wlan: mac acl policy registered
    kbd0 at kbdmux0
    WARNING: Device "spkr" is Giant locked and may be deleted before FreeBSD 14.0.
    vtvga0: <VT VGA driver>
    efirtc0: <EFI Realtime Clock>
    efirtc0: registered as a time-of-day clock, resolution 1.000000s
    smbios0: <System Management BIOS> at iomem 0x7945e000-0x7945e01e
    smbios0: Version: 3.0, BCD Revision: 3.0
    aesni0: <AES-CBC,AES-CCM,AES-GCM,AES-ICM,AES-XTS,SHA1,SHA256>
    acpi0: <INSYDE WALLABY>
    acpi0: Power Button (fixed)
    cpu0: <ACPI CPU> on acpi0
    hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff irq 0,8 on acpi0
    Timecounter "HPET" frequency 14318180 Hz quality 950
    Event timer "HPET" frequency 14318180 Hz quality 350
    Event timer "HPET1" frequency 14318180 Hz quality 350
    Event timer "HPET2" frequency 14318180 Hz quality 350
    atrtc0: <AT realtime clock> port 0x70-0x71 on acpi0
    atrtc0: registered as a time-of-day clock, resolution 1.000000s
    Event timer "RTC" frequency 32768 Hz quality 0
    attimer0: <AT timer> port 0x40-0x43 on acpi0
    Timecounter "i8254" frequency 1193182 Hz quality 0
    Event timer "i8254" frequency 1193182 Hz quality 100
    apei0: <ACPI Platform Error Interface> on acpi0
    Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
    acpi_timer0: <32-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0
    acpi_button0: <Power Button> on acpi0
    pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
    pci0: <ACPI PCI bus> on pcib0
    pci0: <base peripheral, IOMMU> at device 0.2 (no driver attached)
    pcib1: <ACPI PCI-PCI bridge> at device 1.3 on pci0
    pci1: <ACPI PCI bus> on pcib1
    nvme0: <Generic NVMe Device> mem 0x80900000-0x80903fff at device 0.0 on pci1
    pcib2: <ACPI PCI-PCI bridge> at device 1.4 on pci0
    pci2: <ACPI PCI bus> on pcib2
    igb0: <Intel(R) I210 Flashless (Copper)> port 0x5000-0x501f mem 0x80800000-0x8081ffff,0x80820000-0x80823fff at device 0.0 on pci2
    igb0: NVM V0.6 imgtype6
    igb0: Using 1024 TX descriptors and 1024 RX descriptors
    igb0: Using 4 RX queues 4 TX queues
    igb0: Using MSI-X interrupts with 5 vectors
    igb0: Ethernet address: f4:90:ea:00:a2:06
    igb0: netmap queues/slots: TX 4/1024, RX 4/1024
    pcib3: <ACPI PCI-PCI bridge> at device 1.5 on pci0
    pci3: <ACPI PCI bus> on pcib3
    igb1: <Intel(R) I210 Flashless (Copper)> port 0x4000-0x401f mem 0x80700000-0x8071ffff,0x80720000-0x80723fff at device 0.0 on pci3
    igb1: NVM V0.6 imgtype6
    igb1: Using 1024 TX descriptors and 1024 RX descriptors
    igb1: Using 4 RX queues 4 TX queues
    igb1: Using MSI-X interrupts with 5 vectors
    igb1: Ethernet address: f4:90:ea:00:a2:07
    igb1: netmap queues/slots: TX 4/1024, RX 4/1024
    pcib4: <ACPI PCI-PCI bridge> at device 1.6 on pci0
    pci4: <ACPI PCI bus> on pcib4
    igb2: <Intel(R) I210 Flashless (Copper)> port 0x3000-0x301f mem 0x80600000-0x8061ffff,0x80620000-0x80623fff at device 0.0 on pci4
    igb2: NVM V0.6 imgtype6
    igb2: Using 1024 TX descriptors and 1024 RX descriptors
    igb2: Using 4 RX queues 4 TX queues
    igb2: Using MSI-X interrupts with 5 vectors
    igb2: Ethernet address: f4:90:ea:00:a2:08
    igb2: netmap queues/slots: TX 4/1024, RX 4/1024
    pcib5: <ACPI PCI-PCI bridge> at device 1.7 on pci0
    pci5: <ACPI PCI bus> on pcib5
    igb3: <Intel(R) I210 Flashless (Copper)> port 0x2000-0x201f mem 0x80500000-0x8051ffff,0x80520000-0x80523fff at device 0.0 on pci5
    igb3: NVM V0.6 imgtype6
    igb3: Using 1024 TX descriptors and 1024 RX descriptors
    igb3: Using 4 RX queues 4 TX queues
    igb3: Using MSI-X interrupts with 5 vectors
    igb3: Ethernet address: f4:90:ea:00:a2:09
    igb3: netmap queues/slots: TX 4/1024, RX 4/1024
    pcib6: <ACPI PCI-PCI bridge> at device 3.1 on pci0
    pci6: <ACPI PCI bus> on pcib6
    ice0: <Intel(R) Ethernet Network Adapter E810-XXV-2 - 1.37.11-k> mem 0x7fcfc000000-0x7fcfdffffff,0x7fcfe010000-0x7fcfe01ffff at device 0.0 on pci6
    ice0: Loading the iflib ice driver
    ice0: The DDP package was successfully loaded: ICE OS Default Package version 1.3.30.0, track id 0xc0000001.
    ice0: fw 6.2.9 api 1.7 nvm 3.20 etid 8000d853 netlist 3.20.5000-1.e.0.495c77bc oem 1.3146.0
    ice0: Using 8 Tx and Rx queues
    ice0: Reserving 8 MSI-X interrupts for iRDMA
    ice0: Using MSI-X interrupts with 17 vectors
    ice0: Using 1024 TX descriptors and 1024 RX descriptors
    ice0: Ethernet address: f4:90:ea:00:9f:72
    ice0: PCI Express Bus: Speed 8.0GT/s Width x8
    ice0: Firmware LLDP agent disabled
    ice0: link state changed to UP
    ice0: Link is up, 25 Gbps Full Duplex, Requested FEC: RS-FEC, Negotiated FEC: FC-FEC/BASE-R, Autoneg: False, Flow Control: None
    ice0: netmap queues/slots: TX 8/1024, RX 8/1024
    ice1: <Intel(R) Ethernet Network Adapter E810-XXV-2 - 1.37.11-k> mem 0x7fcfa000000-0x7fcfbffffff,0x7fcfe000000-0x7fcfe00ffff at device 0.1 on pci6
    ice1: Loading the iflib ice driver
    ice1: DDP package already present on device: ICE OS Default Package version 1.3.30.0, track id 0xc0000001.
    ice1: fw 6.2.9 api 1.7 nvm 3.20 etid 8000d853 netlist 3.20.5000-1.e.0.495c77bc oem 1.3146.0
    ice1: Using 8 Tx and Rx queues
    ice1: Reserving 8 MSI-X interrupts for iRDMA
    ice1: Using MSI-X interrupts with 17 vectors
    ice1: Using 1024 TX descriptors and 1024 RX descriptors
    ice1: Ethernet address: f4:90:ea:00:9f:73
    ice1: PCI Express Bus: Speed 8.0GT/s Width x8
    ice1: Firmware LLDP agent disabled
    ice1: link state changed to UP
    ice1: Link is up, 25 Gbps Full Duplex, Requested FEC: RS-FEC, Negotiated FEC: FC-FEC/BASE-R, Autoneg: False, Flow Control: None
    ice1: netmap queues/slots: TX 8/1024, RX 8/1024
    pcib7: <ACPI PCI-PCI bridge> at device 7.1 on pci0
    pci7: <ACPI PCI bus> on pcib7
    pci7: <unknown> at device 0.0 (no driver attached)
    pci7: <encrypt/decrypt> at device 0.2 (no driver attached)
    xhci0: <XHCI (generic) USB 3.0 controller> mem 0x80200000-0x802fffff at device 0.3 on pci7
    xhci0: 64 bytes context size, 64-bit DMA
    usbus0: waiting for BIOS to give up control
    ice1: Module is not present.
    ice1: Possible Solution 1: Check that the module is inserted correctly.
    ice1: Possible Solution 2: If the problem persists, use a cable/module that is found in the supported modules and cables list for this device.
    ice1: link state changed to DOWN
    usbus0 on xhci0
    usbus0: 5.0Gbps Super Speed USB v3.0
    pcib8: <ACPI PCI-PCI bridge> at device 8.1 on pci0
    pci8: <ACPI PCI bus> on pcib8
    pci8: <unknown> at device 0.0 (no driver attached)
    pci8: <encrypt/decrypt> at device 0.1 (no driver attached)
    hdac0: <AMD X370 HDA Controller> mem 0x80180000-0x80187fff at device 0.3 on pci8
    ax0: <AMD 10 Gigabit Ethernet Driver> mem 0x80160000-0x8017ffff,0x80140000-0x8015ffff,0x80188000-0x80189fff at device 0.4 on pci8
    ax0: Using 2048 TX descriptors and 2048 RX descriptors
    ax0: Using 8 RX queues 8 TX queues
    ax0: Using MSI-X interrupts with 12 vectors
    ax0: Ethernet address: f4:90:ea:00:a2:0a
    ax0: xgbe_config_sph_mode: SPH disabled in channel 0
    ax0: xgbe_config_sph_mode: SPH disabled in channel 1
    ax0: xgbe_config_sph_mode: SPH disabled in channel 2
    ax0: xgbe_config_sph_mode: SPH disabled in channel 3
    ax0: xgbe_config_sph_mode: SPH disabled in channel 4
    ax0: xgbe_config_sph_mode: SPH disabled in channel 5
    ax0: xgbe_config_sph_mode: SPH disabled in channel 6
    ax0: xgbe_config_sph_mode: SPH disabled in channel 7
    ax0: RSS Enabled
    ax0: Receive checksum offload Enabled
    ax0: VLAN filtering Enabled
    ax0: VLAN Stripping Enabled
    ax0: Checking GPIO expander validity
    ax0: GPIO configuration valid
    ax0: xgbe_phy_sfp_signals: port_sfp_inputs: 0x7
    ax0: xgbe_phy_sfp_detect: mod absent
    ax0: netmap queues/slots: TX 8/2048, RX 8/2048
    ax1: <AMD 10 Gigabit Ethernet Driver> mem 0x80120000-0x8013ffff,0x80100000-0x8011ffff,0x8018a000-0x8018bfff at device 0.5 on pci8
    ax1: Using 2048 TX descriptors and 2048 RX descriptors
    ax1: Using 8 RX queues 8 TX queues
    ax1: Using MSI-X interrupts with 12 vectors
    ax1: Ethernet address: f4:90:ea:00:a2:0b
    ax1: xgbe_config_sph_mode: SPH disabled in channel 0
    ax1: xgbe_config_sph_mode: SPH disabled in channel 1
    ax1: xgbe_config_sph_mode: SPH disabled in channel 2
    ax1: xgbe_config_sph_mode: SPH disabled in channel 3
    ax1: xgbe_config_sph_mode: SPH disabled in channel 4
    ax1: xgbe_config_sph_mode: SPH disabled in channel 5
    ax1: xgbe_config_sph_mode: SPH disabled in channel 6
    ax1: xgbe_config_sph_mode: SPH disabled in channel 7
    ax1: RSS Enabled
    ax1: Receive checksum offload Enabled
    ax1: VLAN filtering Enabled
    ax1: VLAN Stripping Enabled
    ax1: Checking GPIO expander validity
    ax1: GPIO configuration valid
    ax1: xgbe_phy_sfp_signals: port_sfp_inputs: 0x7
    ax1: xgbe_phy_sfp_detect: mod absent
    ax1: netmap queues/slots: TX 8/2048, RX 8/2048
    isab0: <PCI-ISA bridge> at device 20.3 on pci0
    isa0: <ISA bus> on isab0
    uart2: <16x50 with 256 byte FIFO> iomem 0xfedc9000-0xfedc9fff,0xfedc7000-0xfedc7fff irq 3 on acpi0
    uart2: console (115384,n,8,1)
    hwpstate0: <Cool`n'Quiet 2.0> on cpu0
    Timecounter "TSC-low" frequency 1247655967 Hz quality 1000
    Timecounters tick every 1.000 msec
    ZFS filesystem version: 5
    ZFS storage pool version: features support (5000)
    ugen0.1: <AMD XHCI root HUB> at usbus0
    uhub0 on usbus0
    uhub0: <AMD XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
    nvd0: <TS1TMTE662T2> NVMe namespace
    nvd0: 976762MB (2000409264 512 byte sectors)
    Trying to mount root from zfs:zroot/ROOT/default []...
    uhub0: 8 ports with 8 removable, self powered
    ice1: link state changed to UP
    ice1: Link is up, 25 Gbps Full Duplex, Requested FEC: RS-FEC, Negotiated FEC: FC-FEC/BASE-R, Autoneg: False, Flow Control: None
    igb1: link state changed to UP
    ice0: Module is not present.
    ice0: Possible Solution 1: Check that the module is inserted correctly.
    ice0: Possible Solution 2: If the problem persists, use a cable/module that is found in the supported modules and cables list for this device.
    ice0: Link is up, 25 Gbps Full Duplex, Requested FEC: RS-FEC, Negotiated FEC: FC-FEC/BASE-R, Autoneg: False, Flow Control: None
    intsmb0: <AMD FCH SMBus Controller> at device 20.0 on pci0
    smbus0: <System Management Bus> on intsmb0
    driver bug: Unable to set devclass (class: ppc devname: (unknown))
    ig4iic0: <Designware I2C Controller> iomem 0xfedc2000-0xfedc2fff irq 10 on acpi0
    iicbus0: <Philips I2C bus (ACPI-hinted)> on ig4iic0
    ig4iic1: <Designware I2C Controller> iomem 0xfedc3000-0xfedc3fff irq 11 on acpi0
    iicbus1: <Philips I2C bus (ACPI-hinted)> on ig4iic1
    ig4iic2: <Designware I2C Controller> iomem 0xfedc4000-0xfedc4fff irq 12 on acpi0
    iicbus2: <Philips I2C bus (ACPI-hinted)> on ig4iic2
    ig4iic3: <Designware I2C Controller> iomem 0xfedc5000-0xfedc5fff irq 6 on acpi0
    iicbus3: <Philips I2C bus (ACPI-hinted)> on ig4iic3
    ig4iic4: <Designware I2C Controller> iomem 0xfedc6000-0xfedc6fff irq 14 on acpi0
    iicbus4: <Philips I2C bus (ACPI-hinted)> on ig4iic4
    ig4iic5: <Designware I2C Controller> iomem 0xfedcb000-0xfedcbfff irq 15 on acpi0
    iicbus5: <Philips I2C bus (ACPI-hinted)> on ig4iic5
    lo0: link state changed to UP
    amdsmn0: <AMD Family 17h System Management Network> on hostb0
    amdtemp0: <AMD CPU On-Die Thermal Sensors> on hostb0
    pflog0: permanently promiscuous mode enabled
    lagg0: link state changed to UP
    vlan0: changing name to 'vlan0.1'
    vlan1: changing name to 'vlan0.100'
    ice0: Failed to add VLAN filters:
    ice0: - vlan 100, status -14
    ice0: Failure adding VLAN 100 to main VSI, err ICE_ERR_ALREADY_EXISTS aq_err OK
    ice1: Failed to add VLAN filters:
    ice1: - vlan 100, status -14
    ice1: Failure adding VLAN 100 to main VSI, err ICE_ERR_ALREADY_EXISTS aq_err OK
    vlan2: changing name to 'vlan0.120'
    ice0: Failed to add VLAN filters:
    ice0: - vlan 120, status -14
    ice0: Failure adding VLAN 120 to main VSI, err ICE_ERR_ALREADY_EXISTS aq_err OK
    ice1: Failed to add VLAN filters:
    ice1: - vlan 120, status -14
    ice1: Failure adding VLAN 120 to main VSI, err ICE_ERR_ALREADY_EXISTS aq_err OK
    vlan3: changing name to 'vlan0.121'
    ice0: Failed to add VLAN filters:
    ice0: - vlan 121, status -14
    ice0: Failure adding VLAN 121 to main VSI, err ICE_ERR_ALREADY_EXISTS aq_err OK
    ice1: Failed to add VLAN filters:
    ice1: - vlan 121, status -14
    ice1: Failure adding VLAN 121 to main VSI, err ICE_ERR_ALREADY_EXISTS aq_err OK
    vlan4: changing name to 'vlan0.140'
    ice0: Failed to add VLAN filters:
    ice0: - vlan 140, status -14
    ice0: Failure adding VLAN 140 to main VSI, err ICE_ERR_ALREADY_EXISTS aq_err OK
    ice1: Failed to add VLAN filters:
    ice1: - vlan 140, status -14
    ice1: Failure adding VLAN 140 to main VSI, err ICE_ERR_ALREADY_EXISTS aq_err OK
    vlan5: changing name to 'vlan0.2'
    ice0: Failed to add VLAN filters:
    ice0: - vlan 2, status -14
    ice0: Failure adding VLAN 2 to main VSI, err ICE_ERR_ALREADY_EXISTS aq_err OK
    ice1: Failed to add VLAN filters:
    ice1: - vlan 2, status -14
    ice1: Failure adding VLAN 2 to main VSI, err ICE_ERR_ALREADY_EXISTS aq_err OK
    vlan6: changing name to 'vlan0.250'
    ice0: Failed to add VLAN filters:
    ice0: - vlan 250, status -14
    ice0: Failure adding VLAN 250 to main VSI, err ICE_ERR_ALREADY_EXISTS aq_err OK
    ice1: Failed to add VLAN filters:
    ice1: - vlan 250, status -14
    ice1: Failure adding VLAN 250 to main VSI, err ICE_ERR_ALREADY_EXISTS aq_err OK
    vlan7: changing name to 'vlan0.3'
    ice0: Failed to add VLAN filters:
    ice0: - vlan 3, status -14
    ice0: Failure adding VLAN 3 to main VSI, err ICE_ERR_ALREADY_EXISTS aq_err OK
    ice1: Failed to add VLAN filters:
    ice1: - vlan 3, status -14
    ice1: Failure adding VLAN 3 to main VSI, err ICE_ERR_ALREADY_EXISTS aq_err OK
    igb1: link state changed to DOWN
    igb1: link state changed to UP
    wg0: changing name to 'wg1'
    wg1: link state changed to UP
    tun1: changing name to 'ovpns1'
    tun2: changing name to 'ovpns2'
    tun3: changing name to 'ovpns3'
    ovpns3: link state changed to UP
    WARNING: attempt to domain_add(netgraph) after domainfinalize()
    ovpns3: link state changed to DOWN
    Trying to mount root from zfs:zroot/ROOT/default []...
    lagg0: link state changed to DOWN
    vlan0.1: link state changed to DOWN
    vlan0.2: link state changed to DOWN
    vlan0.100: link state changed to DOWN
    vlan0.3: link state changed to DOWN
    vlan0.140: link state changed to DOWN
    vlan0.250: link state changed to DOWN
    vlan0.121: link state changed to DOWN
    vlan0.120: link state changed to DOWN
    vlan0: changing name to 'vlan0.1'
    ice0: Failed to add VLAN filters:
    ice0: - vlan 1, status -14
    ice0: Failure adding VLAN 1 to main VSI, err ICE_ERR_ALREADY_EXISTS aq_err OK
    ice1: Failed to add VLAN filters:
    ice1: - vlan 1, status -14
    ice1: Failure adding VLAN 1 to main VSI, err ICE_ERR_ALREADY_EXISTS aq_err OK
    vlan1: changing name to 'vlan0.100'
    ice0: Failed to add VLAN filters:
    ice0: - vlan 100, status -14
    ice0: Failure adding VLAN 100 to main VSI, err ICE_ERR_ALREADY_EXISTS aq_err OK
    ice1: Failed to add VLAN filters:
    ice1: - vlan 100, status -14
    ice1: Failure adding VLAN 100 to main VSI, err ICE_ERR_ALREADY_EXISTS aq_err OK
    vlan2: changing name to 'vlan0.120'
    ice0: Failed to add VLAN filters:
    ice0: - vlan 120, status -14
    ice0: Failure adding VLAN 120 to main VSI, err ICE_ERR_ALREADY_EXISTS aq_err OK
    ice1: Failed to add VLAN filters:
    ice1: - vlan 120, status -14
    ice1: Failure adding VLAN 120 to main VSI, err ICE_ERR_ALREADY_EXISTS aq_err OK
    vlan3: changing name to 'vlan0.121'
    ice0: Failed to add VLAN filters:
    ice0: - vlan 121, status -14
    ice0: Failure adding VLAN 121 to main VSI, err ICE_ERR_ALREADY_EXISTS aq_err OK
    ice1: Failed to add VLAN filters:
    ice1: - vlan 121, status -14
    ice1: Failure adding VLAN 121 to main VSI, err ICE_ERR_ALREADY_EXISTS aq_err OK
    vlan4: changing name to 'vlan0.140'
    ice0: Failed to add VLAN filters:
    ice0: - vlan 140, status -14
    ice0: Failure adding VLAN 140 to main VSI, err ICE_ERR_ALREADY_EXISTS aq_err OK
    ice1: Failed to add VLAN filters:
    ice1: - vlan 140, status -14
    ice1: Failure adding VLAN 140 to main VSI, err ICE_ERR_ALREADY_EXISTS aq_err OK
    vlan5: changing name to 'vlan0.2'
    ice0: Failed to add VLAN filters:
    ice0: - vlan 2, status -14
    ice0: Failure adding VLAN 2 to main VSI, err ICE_ERR_ALREADY_EXISTS aq_err OK
    ice1: Failed to add VLAN filters:
    ice1: - vlan 2, status -14
    ice1: Failure adding VLAN 2 to main VSI, err ICE_ERR_ALREADY_EXISTS aq_err OK
    vlan6: changing name to 'vlan0.250'
    ice0: Failed to add VLAN filters:
    ice0: - vlan 250, status -14
    ice0: Failure adding VLAN 250 to main VSI, err ICE_ERR_ALREADY_EXISTS aq_err OK
    ice1: Failed to add VLAN filters:
    ice1: - vlan 250, status -14
    ice1: Failure adding VLAN 250 to main VSI, err ICE_ERR_ALREADY_EXISTS aq_err OK
    vlan7: changing name to 'vlan0.3'
    ice0: Failed to add VLAN filters:
    ice0: - vlan 3, status -14
    ice0: Failure adding VLAN 3 to main VSI, err ICE_ERR_ALREADY_EXISTS aq_err OK
    ice1: Failed to add VLAN filters:
    ice1: - vlan 3, status -14
    ice1: Failure adding VLAN 3 to main VSI, err ICE_ERR_ALREADY_EXISTS aq_err OK
    ovpns3: link state changed to UP
    lagg0: link state changed to UP
    vlan0.1: link state changed to UP
    vlan0.2: link state changed to UP
    vlan0.100: link state changed to UP
    vlan0.3: link state changed to UP
    vlan0.140: link state changed to UP
    vlan0.250: link state changed to UP
    vlan0.121: link state changed to UP
    vlan0.120: link state changed to UP
    igb1: link state changed to DOWN
    igb1: link state changed to UP
    igb1: link state changed to DOWN
    igb1: link state changed to UP
    ice0: promiscuous mode enabled
    ice1: promiscuous mode enabled
    lagg0: promiscuous mode enabled
    vlan0.100: promiscuous mode enabled
    ice0: promiscuous mode disabled
    ice1: promiscuous mode disabled
    lagg0: promiscuous mode disabled
    vlan0.100: promiscuous mode disabled

Environment

Version 24.4.1
Architecture amd64
Commit 77b950d6f
Mirror https://opnsense-update.deciso.com/${SUBSCRIPTION}/FreeBSD:13:amd64/24.4 CPU AMD EPYC 3251 8-Core Processor (8 cores, 8 threads) HW DEC4040

AdSchellevis commented 1 month ago

if 23.10.x works as expected but 24.1 doesn't, the first question is what the difference is in generated ruleset (/tmp/rules.debug). Would it be possible to collect both on the exact same configuration? Common issues in these cases relate to reply-to and route-to rules .

morikplay commented 1 month ago

@AdSchellevis Many thanks for the prompt reply!

firmware-->changelog history:

Version Date    
24.4.1 (installed)  2024-06-20  
24.4    2024-04-30  
23.10.3 2024-03-28  

Now I notice that 23.x boot environment is no longer visible. I only see the base snapshot manually created my me back in 2022!

bectl list -a
BE/Dataset/Snapshot  Active Mountpoint Space Created

default
  zroot/ROOT/default NR     /          8.44G 2022-10-25 01:28

Perhaps the reboot after 24.x cleared previous boot environments?

What would the sequence of involved steps to furnish you with appropriate /tmp/rules.debug now be?

  1. perform bectl create 24.4.1-buggy
  2. perform fresh install of 23.10 via console (nano image) or opnsense-revert -kr 23.10
  3. import current config
  4. save /tmp/rules.debug
  5. hopefully, if its a working system at /1/ then follow normal upgrade system prompts via web GUI
  6. save /tmp/rules.debug
  7. compare and share here
AdSchellevis commented 1 month ago

A reinstall with 23.10 and update to the latest version in the 23.10 branch would be best indeed, then test if the issue is indeed not there and collect evidence.

If you want to exclude pf as most likely cause of your issue before reinstalling, I sometimes temporary disable pf as well using pfctl -d then check local connectivity and enable it again (pfctl -d). When local communication works without pf enabled, it's either a policy based routing rule (route-to/reply-to) or a nat issue.

morikplay commented 1 month ago

Thank @AdSchellevis. I'll be remote (to FW in question) for the next few weeks. So, a full blown downgrade w/ console access might be a tall order until then.

The following didn't make a dent in the issue:

  1. Disable pf using pfctl -d
  2. curl failed
    root@MorikCage:~ # curl -vi --connect-timeout 5 http://homeassistant.esco.ghaar
    * Host homeassistant.esco.ghaar:80 was resolved.
    * IPv6: (none)
    * IPv4: 192.168.0.58
    *   Trying 192.168.0.58:80...
    * ipv4 connect timeout after 4999ms, move on!
    * Failed to connect to homeassistant.esco.ghaar port 80 after 5002 ms: Timeout was reached
    * Closing connection
    curl: (28) Failed to connect to homeassistant.esco.ghaar port 80 after 5002 ms: Timeout was reached
  3. Tried /2/ again after a few minutes. Same result. Although i suspect state tables were still active?
  4. Enabled pf via pfctl -e

I could also attempt a step 2.5 with pfctl -F all to see whether that helps. But, I doubt it as my rules haven't changed between the two releases.

Would there be any other files besides:

AdSchellevis commented 1 month ago

well, if it doesn't work when pf is disabled in full (state table doesn't matter in that case), the question is if there is a valid default gateway installed and the interface used has a valid netmask.

netstat -nr4
ifconfig
morikplay commented 1 month ago
# netstat -nr4
Routing tables

Internet:
Destination        Gateway            Flags     Netif Expire
default            76.214.40.1        UGS        igb1
10.10.0.0/24       link#25            U        ovpns3
10.10.0.1          link#25            UHS         lo0
10.20.30.0/24      link#22            U           wg1
10.20.30.1         link#22            UHS         lo0
10.20.30.2         link#22            UHS         wg1
10.20.30.3         link#22            UHS         wg1
10.20.30.4         link#22            UHS         wg1
76.214.40.0/22     link#2             U          igb1
76.214.40.228      link#2             UHS         lo0
127.0.0.1          link#10            UH          lo0
192.168.0.0/23     link#21            U       vlan0.3
192.168.0.1        link#21            UHS         lo0
192.168.2.0/27     link#19            U       vlan0.2
192.168.2.1        link#19            UHS         lo0
192.168.98.0/24    link#13            U         lagg0
192.168.98.1       link#13            UHS         lo0
192.168.99.0/24    link#1             U          igb0
192.168.99.1       link#1             UHS         lo0
192.168.100.0/24   link#15            U      vlan0.10
192.168.100.1      link#15            UHS         lo0
192.168.120.0/24   link#16            U      vlan0.12
192.168.120.1      link#16            UHS         lo0
192.168.121.0/24   link#17            U      vlan0.12
192.168.121.1      link#17            UHS         lo0
192.168.140.0/24   link#18            U      vlan0.14
192.168.140.1      link#18            UHS         lo0
192.168.250.0/24   link#20            U      vlan0.25
192.168.250.1      link#20            UHS         lo0
# ping www.google.com
PING www.google.com (142.250.68.68): 56 data bytes
64 bytes from 142.250.68.68: icmp_seq=0 ttl=116 time=5.891 ms
64 bytes from 142.250.68.68: icmp_seq=1 ttl=116 time=7.310 ms
^C
--- www.google.com ping statistics ---
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 5.891/6.601/7.310/0.709 ms
#ifconfig
igb0: flags=8863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
    description: LAN (lan)
    options=4800028<VLAN_MTU,JUMBO_MTU,NOMAP>
    inet 192.168.99.1 netmask 0xffffff00 broadcast 192.168.99.255
    groups: FG_ALL_VLANs FG_CRITICAL_LAN
    media: Ethernet autoselect
    status: no carrier
    nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
igb1: flags=8863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
    description: Morik_WAN (wan)
    options=4800028<VLAN_MTU,JUMBO_MTU,NOMAP>
    inet 76.214.40.228 netmask 0xfffffc00 broadcast 76.214.43.255
    media: Ethernet autoselect (1000baseT <full-duplex>)
    status: active
    nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
igb2: flags=8822<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
    options=4800028<VLAN_MTU,JUMBO_MTU,NOMAP>

    media: Ethernet autoselect
    status: no carrier
    nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
igb3: flags=8822<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
    options=4800028<VLAN_MTU,JUMBO_MTU,NOMAP>
    media: Ethernet autoselect
    status: no carrier
    nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
ice0: flags=8863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000
    options=4800028<VLAN_MTU,JUMBO_MTU,NOMAP>
    media: Ethernet autoselect (25G-AUI <full-duplex>)
    status: active
    nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
ice1: flags=8863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000
    options=4800028<VLAN_MTU,JUMBO_MTU,NOMAP>
    media: Ethernet autoselect (25G-AUI <full-duplex>)
    status: active
    nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
ax0: flags=8822<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
    options=4800028<VLAN_MTU,JUMBO_MTU,NOMAP>
    media: Ethernet autoselect
    status: no carrier
    nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
ax1: flags=8822<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
    options=4800028<VLAN_MTU,JUMBO_MTU,NOMAP>
    media: Ethernet autoselect
    status: no carrier
    nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
enc0: flags=0<> metric 0 mtu 1536
    groups: enc
    nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
    options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
    inet6 ::1 prefixlen 128
    inet6 fe80::1%lo0 prefixlen 64 scopeid 0xa
    inet 127.0.0.1 netmask 0xff000000
    groups: lo
    nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
pflog0: flags=20100<PROMISC,PPROMISC> metric 0 mtu 33160
    groups: pflog
pfsync0: flags=0<> metric 0 mtu 1500
    syncpeer: 0.0.0.0 maxupd: 128 defer: off
    syncok: 1
    groups: pfsync
lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000
    description: main_LAGG (opt1)
    options=4800028<VLAN_MTU,JUMBO_MTU,NOMAP>
    inet 192.168.98.1 netmask 0xffffff00 broadcast 192.168.98.255
    laggproto lacp lagghash l2,l3,l4
    laggport: ice0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
    laggport: ice1 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
    groups: lagg FG_ALL_VLANs FG_CRITICAL_LAN
    media: Ethernet autoselect
    status: active
    nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
vlan0.1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000
    options=4000000<NOMAP>
    groups: vlan
    vlan: 1 vlanproto: 802.1q vlanpcp: 0 parent interface: lagg0
    media: Ethernet autoselect
    status: active
    nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
vlan0.100: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000
    description: Servers (opt6)
    options=4000000<NOMAP>
    inet 192.168.100.1 netmask 0xffffff00 broadcast 192.168.100.255
    groups: vlan FG_ALL_VLANs FG_CRITICAL_LAN
    vlan: 100 vlanproto: 802.1q vlanpcp: 7 parent interface: lagg0
    media: Ethernet autoselect
    status: active
    nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
vlan0.120: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000
    description: Storage (opt3)
    options=4000000<NOMAP>
    ether f4:90:ea:00:9f:72
    inet 192.168.120.1 netmask 0xffffff00 broadcast 192.168.120.255
    groups: vlan FG_ALL_VLANs FG_CRITICAL_LAN
    vlan: 120 vlanproto: 802.1q vlanpcp: 2 parent interface: lagg0
    media: Ethernet autoselect
    status: active
    nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
vlan0.121: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000
    description: Storage_Backup (opt9)
    options=4000000<NOMAP>
    inet 192.168.121.1 netmask 0xffffff00 broadcast 192.168.121.255
    groups: vlan FG_ALL_VLANs FG_CRITICAL_LAN
    vlan: 121 vlanproto: 802.1q vlanpcp: 2 parent interface: lagg0
    media: Ethernet autoselect
    status: active
    nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
vlan0.140: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000
    description: Supervisor (opt4)
    options=4000000<NOMAP>
    inet 192.168.140.1 netmask 0xffffff00 broadcast 192.168.140.255
    groups: vlan FG_ALL_VLANs FG_CRITICAL_LAN
    vlan: 140 vlanproto: 802.1q vlanpcp: 2 parent interface: lagg0
    media: Ethernet autoselect
    status: active
    nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
vlan0.2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000
    description: vCamsTraffic (opt2)
    options=4000000<NOMAP>
    inet 192.168.2.1 netmask 0xffffffe0 broadcast 192.168.2.31
    groups: vlan FG_ALL_VLANs
    vlan: 2 vlanproto: 802.1q vlanpcp: 1 parent interface: lagg0
    media: Ethernet autoselect
    status: active
    nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
vlan0.250: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000
    description: IoT (opt5)
    options=4000000<NOMAP>
    inet 192.168.250.1 netmask 0xffffff00 broadcast 192.168.250.255
    groups: vlan FG_ALL_VLANs
    vlan: 250 vlanproto: 802.1q vlanpcp: 0 parent interface: lagg0
    media: Ethernet autoselect
    status: active
    nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
vlan0.3: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
    description: main (opt7)
    options=4000000<NOMAP>
    inet 192.168.0.1 netmask 0xfffffe00 broadcast 192.168.1.255
    groups: vlan FG_ALL_VLANs FG_CRITICAL_LAN
    vlan: 3 vlanproto: 802.1q vlanpcp: 2 parent interface: lagg0
    media: Ethernet autoselect
    status: active
    nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
wg1: flags=80c1<UP,RUNNING,NOARP,MULTICAST> metric 0 mtu 1420
    description: i_wireguard (opt10)
    options=80000<LINKSTATE>
    inet 10.20.30.1 netmask 0xffffff00
    groups: wg wireguard
    nd6 options=9<PERFORMNUD,IFDISABLED>
ovpns1: flags=8010<POINTOPOINT,MULTICAST> metric 0 mtu 1500
    options=80000<LINKSTATE>
    groups: tun openvpn
    nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
ovpns2: flags=8010<POINTOPOINT,MULTICAST> metric 0 mtu 1500
    options=80000<LINKSTATE>
    groups: tun openvpn
    nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
ovpns3: flags=8043<UP,BROADCAST,RUNNING,MULTICAST> metric 0 mtu 1500
    options=80000<LINKSTATE>
    inet 10.10.0.1 netmask 0xffffff00 broadcast 10.10.0.255
    groups: tun openvpn
    nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
    Opened by PID 65678

Pictorial view also attached for easier readability. Routing overview

AdSchellevis commented 1 month ago

Just to be sure, internet hosts also don't work or only access to 192.168.0.58?

You may remove sensitive data from the output by the way, for debugging we don't need mac addresses and exact netblocks.

morikplay commented 1 month ago

@AdSchellevis, thank you. I'll remove MAC addresses and such shortly.

Yes, access to internet (NAT) isn't a problem. The issue is only for traffic which originates from OpnSense e.g. syslog to a syslog server on lan, or Telegraf metrics destined for influxdb on lan, CrowdSec http notification to a LAPI on lan, simple curl requests to anywhere on any lan etc.

AdSchellevis commented 1 month ago

I wouldn't suspect the firewall to be honest, my next step would be to capture the traffic on both ends (packet capture on the firewall and on the target for the traffic between both hosts).

These type of issues usually relate to wrong gateways on the client in which case traffic does not return to the expected host.

morikplay commented 1 month ago

Thank you for your continued guidance @AdSchellevis . Initially, I too had the suspicion that it must have been me (by virtue of network configuration changes) which would've caused this issue. But, few aspects don't support that conclusion:

  1. Literally the only change was Opnsense 23.10-->24.4. Opnsense originated traffic towards VLAN was fine up to 23.x.
  2. Looking at the tcpdump captures at Opnsense interface (opt6) from initial message, one can see that opt6 (same goes for other opt interfaces) does receive a SYN-ACK in response to its SYN. But, application logic (i.e. something above the TCP/IPv4 stack in opnsense) does not receive delivery of the SYN-ACKs. This causes TCP/IPv4 stack on opnsense to repeat SYN. Re-pasting for brevity
    Servers
    vlan0.100   2024-06-28
    07:37:50.442037 f4:90:ea:00:9f:72   00:50:56:82:d8:b4   ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.100.1.31315 > 192.168.100.21.8080: Flags [S], cksum 0x8070 (correct), seq 445912424, win 65535, options [mss 8960,nop,wscale 12,sackOK,TS val 1292126707 ecr 0], length 0
    Servers
    vlan0.100   2024-06-28
    07:37:50.442400 00:50:56:82:d8:b4   f4:90:ea:00:9f:72   ethertype IPv4 (0x0800), length 74: (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.100.21.8080 > 192.168.100.1.31315: Flags [S.], cksum 0xe967 (correct), seq 3873949677, ack 445912425, win 43440, options [mss 1460,sackOK,TS val 3838080763 ecr 1292126707,nop,wscale 9], length 0
    Servers
    vlan0.100   2024-06-28
    07:37:51.442697 f4:90:ea:00:9f:72   00:50:56:82:d8:b4   ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.100.1.31315 > 192.168.100.21.8080: Flags [S], cksum 0x7c87 (correct), seq 445912424, win 65535, options [mss 8960,nop,wscale 12,sackOK,TS val 1292127708 ecr 0], length 0
    Servers
    vlan0.100   2024-06-28
    07:37:51.443231 00:50:56:82:d8:b4   f4:90:ea:00:9f:72   ethertype IPv4 (0x0800), length 74: (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.100.21.8080 > 192.168.100.1.31315: Flags [S.], cksum 0xe57e (correct), seq 3873949677, ack 445912425, win 43440, options [mss 1460,sackOK,TS val 3838081764 ecr 1292126707,nop,wscale 9], length 0
    Servers
    vlan0.100   2024-06-28
    07:37:52.462713 00:50:56:82:d8:b4   f4:90:ea:00:9f:72   ethertype IPv4 (0x0800), length 74: (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.100.21.8080 > 192.168.100.1.31315: Flags [S.], cksum 0xe182 (correct), seq 3873949677, ack 445912425, win 43440, options [mss 1460,sackOK,TS val 3838082784 ecr 1292126707,nop,wscale 9], length 0
    Servers
    vlan0.100   2024-06-28
    07:37:53.642675 f4:90:ea:00:9f:72   00:50:56:82:d8:b4   ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.100.1.31315 > 192.168.100.21.8080: Flags [S], cksum 0x73ef (correct), seq 445912424, win 65535, options [mss 8960,nop,wscale 12,sackOK,TS val 1292129908 ecr 0], length 0
    Servers
    vlan0.100   2024-06-28
    07:37:53.643161 00:50:56:82:d8:b4   f4:90:ea:00:9f:72   ethertype IPv4 (0x0800), length 74: (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.100.21.8080 > 192.168.100.1.31315: Flags [S.], cksum 0xdce6 (correct), seq 3873949677, ack 445912425, win 43440, options [mss 1460,sackOK,TS val 3838083964 ecr 1292126707,nop,wscale 9], length 0
    Servers
    vlan0.100   2024-06-28
    07:37:55.662758 00:50:56:82:d8:b4   f4:90:ea:00:9f:72   ethertype IPv4 (0x0800), length 74: (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.100.21.8080 > 192.168.100.1.31315: Flags [S.], cksum 0xd502 (correct), seq 3873949677, ack 445912425, win 43440, options [mss 1460,sackOK,TS val 3838085984 ecr 1292126707,nop,wscale 9], length 0
    Servers
    vlan0.100   2024-06-28
    07:37:57.842474 f4:90:ea:00:9f:72   00:50:56:82:d8:b4   ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.100.1.31315 > 192.168.100.21.8080: Flags [S], cksum 0x6387 (correct), seq 445912424, win 65535, options [mss 8960,nop,wscale 12,sackOK,TS val 1292134108 ecr 0], length 0
    Servers
    vlan0.100   2024-06-28
    07:37:57.842885 00:50:56:82:d8:b4   f4:90:ea:00:9f:72   ethertype IPv4 (0x0800), length 74: (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.100.21.8080 > 192.168.100.1.31315: Flags [S.], cksum 0xcc7e (correct), seq 3873949677, ack 445912425, win 43440, options [mss 1460,sackOK,TS val 3838088164 ecr 1292126707,nop,wscale 9], length 0
    Servers
    vlan0.100   2024-06-28
    07:38:01.966765 00:50:56:82:d8:b4   f4:90:ea:00:9f:72   ethertype IPv4 (0x0800), length 74: (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.100.21.8080 > 192.168.100.1.31315: Flags [S.], cksum 0xbc62 (correct), seq 3873949677, ack 445912425, win 43440, options [mss 1460,sackOK,TS val 3838092288 ecr 1292126707,nop,wscale 9], length 0
  3. Both curl and nc exhibit really odd behavior. Latter recognizes tcp/443 as udp/443.
  4. Intermittent allowance of connection establishment for FW-originated traffic from FW-->VLAN host(s) tells me its something to do w/ pf and/or rules.
    #nc -4znvw 10 192.168.0.58 443
    Connection to 192.168.0.58 443 port [tcp/*] succeeded!
    <!-- immediately following which another series of requests fail -->
    ...
    #nc -4znvw 10 192.168.0.58 443
    nc: connect to 192.168.0.58 port 443 (tcp) failed: Operation timed out
    # nc -4znvw 10 192.168.0.58 443
    nc: connect to 192.168.0.58 port 443 (tcp) failed: Operation timed out
  5. If gateways were incorrectly configured on client(s) (pretty much all of the 150+ hosts in the network) then reachability issues would've manifested one way or another. Packet drop stats on those hosts are 0 when other LAN hosts / clients are involved.
  6. To match /2/, attached is tcpdump from the peer (192.168.100.21:8080). I picked a different host (than .100.6) in VLAN-100 on a non-standard port to illustrate the point further. As can be seen, the host keeps getting re-transmissions of SYN from FW (caused as a result of curl -vi --connect-timeout 5 http://192.168.100.21:8080. At the FW interface, NIC receives the packet, but doesn't respond with ACK. from_host.zip

from_fw.pcap.zip

AdSchellevis commented 1 month ago

I'm a bit out of clues I'm afraid, I suspect there is a logical explanation, but tracking this exceeds my current community support time.