Closed tomatotoast closed 4 years ago
Screenshot 1, both NICs have the same Mac address, looks weird.
Does this also occur with lacp and no vlans? Maybe a native vlan mismatch in the switch?
IT appears even if no switch is attached. So no vlans.
(no physical links) still errors root@OPNsense:~ # netstat -i Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll lagg0 1500 <Link#11> 00:1b:21:a7:5b:f2 0 0 0 0 5 0 lagg0 - fe80::%lagg0/ fe80::21b:21ff:fe 0 - - 2 - -
both nics have different mac adresses per default: igb2: flags=8c02<BROADCAST,OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6> ether 00:1b:21:a7:5b:f2 hwaddr 00:1b:21:a7:5b:f2 nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> media: Ethernet autoselect status: no carrier igb3: flags=8c02<BROADCAST,OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6> ether 00:1b:21:a7:5b:f3 hwaddr 00:1b:21:a7:5b:f3 nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> media: Ethernet autoselect status: no carrier
While creating the lagg both interfaces have a different mac. once i click edit after lagg creation both interfaces share the same mac.
I don't think this (Oerrs) has nothing to do with your problem, can you please check lacp to switch without VLANs and look if you have a link after reboot?
I have created a lagg interface and added an ip to it. then i setup a dhcp server and range for this lag link. on the switch i createt a lacp group with a untagged vlan on it. so to speak access port. i added a vm in esxi to this vlan.
the vm did not receive an ip adress. rebooting the opnsense machine did not do anything.
testlacp interface (opt6, lagg0) Status up MAC address 00:1b:21:a7:5b:f2 - Intel Corporate MTU 1500 IPv4 address 192.168.160.1 / 24 IPv6 Link Local fe80::21b:21ff:fea7:5bf2 / 64 Media Ethernet autoselect LAGG Protocol lacp lagghash l2,l3,l4 LAGG Ports igb2 igb3 In/out packets 0 / 1 (0 bytes / 116 bytes ) In/out packets (pass) 0 / 1 (0 bytes / 116 bytes ) In/out packets (block) 0 / 0 (0 bytes / 0 bytes ) In/out errors 0/4 Collisions 0
So, you dont have carrier problems when rebooting without vlans, correct?
The Link is up, but it seems that no data can be transmitted (with or without vlans). The client (vm) does not receive a ip from the dhcp. The packet counter on the interface is 1 out.
German: So wie es aussieht, besteht das Problem unabhängig von Vlans auf dem LAGG Interface. Auch ein Reboot bringt keine Abhilfe. Ein mit dem Netz verbundener Client erhält keine IP von dem auf dem LACP Interface konfiguriertem DHCP-Server.
Can you check with ping and the arp cache on client and server if mac address is learned? Also on the switch.
Which hypervisor is running the VM? Usually VMware uses own HA and not LACP.
I have no idea what went wrong yesterday. My ESXi does not use a multilink configuration. Before testing i rebooted everything and pinging in both directions now works. But the issue with the rising error counter sill persists. On the switch ports there are no errors.
The mac adresstable from the switch: https://ibb.co/bB9zPdz
ifconfig OPNsense:
igb2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=400b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO> ether 00:1b:21:a7:5b:f2 hwaddr 00:1b:21:a7:5b:f2 nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> media: Ethernet autoselect status: no carrier igb3: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=400b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO> ether 00:1b:21:a7:5b:f2 hwaddr 00:1b:21:a7:5b:f3 nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> media: Ethernet autoselect status: no carrier
arptable opnsense: root@OPNsense:~ # arp -a | grep 160 ? (192.168.160.1) at 00:1b:21:a7:5b:f2 on lagg0 permanent [ethernet] ? (192.168.160.100) at 00:0c:29:07:58:14 on lagg0 expires in 1177 seconds [ethernet]
arptable client: user@test:~$ arp -a ? (192.168.160.1) at 00:1b:21:a7:5b:f2 [ether] on ens160
port errors: root@OPNsense:~ # netstat -i Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll lagg0 1500 <Link#16> 00:1b:21:a7:5b:f2 407 0 0 1222 12 0 lagg0 - fe80::%lagg0/ fe80::21b:21ff:fe 0 - - 2 - - lagg0 - 192.168.160.0 192.168.160.1 182 - - 51 - -
You have a learned mac address on lagg0 but both interfaces don't have a link?
Read again: I have no idea what went wrong yesterday. My ESXi does not use a multilink configuration. Before testing i rebooted everything and pinging in both directions now works. But the issue with the rising error counter sill persists. On the switch ports there are no errors.
This error happens even with a complete new installed system. Errors appear even without any cables attached.
Look, when you have a Switch tagging packets on port 1 with vlan 5 and on the other side it expects vlan 4, you'll see interface errors. When you see the errors even without cables it's just an error because it can't send a specific kind of traffic, like not forming LACP neighborship
Please answer these four questions:
1.: Why do both physical interfaces share the same MAC adress once the lagg is formed? I have not seen this behavior running Ubuntu Server 19.10 for example.
2.: Why does the error counter go up. Even when the same VLANs are configured on both sides and both interfaces are connected to both lacp port group interfaces on the switch? The devices should easily exchange LACPDUs.
3.: Why does the interface error counter stay at a solid zero running a LAG on Ubuntu Server 19.10? (lag setup with netplan) With or without physical links up.
4.: If it is expected behavior that the counter goes up due to unanswered LACPDUs, where is it documented?
When the nics are disconected i get this output:
Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll lagg0 1500 <Link#19> a8:5e:45:3d:ed:34 0 0 0 0 5 0 lagg0 - fe80::%lagg0/ fe80::aa5e:45ff:f 0 - - 2 - -
2 Out Packages and 5 Out Errors. This is so strange.
Another Freebsd User also experienced this issue: https://forums.freebsd.org/threads/lagg-4-interface-output-errors.46022/
@mimugmail Since you are running an LACP Setup: what does the Error Counter from your lagg Interface look like?
Does it look the same? If so It might be an Freebsd issue.
My counters look similar. But they are in production, no way to unplug a cable or move vlans around.
lagg0 1500 <Link#13> ac:1f:6b:6c:95:2a 17981726252 0 0 7394162286 17 0
lagg0 - fe80::%lagg0/ fe80::ae1f:6bff:f 0 - - 1 - -
lagg1 1500 <Link#14> ac:1f:6b:6c:9b:34 7277851588 0 0 17991805018 108 0
lagg1 - fe80::%lagg1/ fe80::ae1f:6bff:f 0 - - 2 - -
lagg1 1500 <Link#15> ac:1f:6b:6c:9b:34 5 0 0 1006453 7 0
This issue is not caused by OPNsense. It can be closed.
The issue is still persistent after a fresh 20.7 installation. Maybe related to: #4235 https://github.com/opnsense/core/issues/4235
Didnt you state it's not related to OPNsense?
Another user also found this problem. Back in 2019: https://forum.opnsense.org/index.php?topic=15005.0
In the forum Post is stated that he can reach full throughput. Error can come from everything, unknown packets, wrong checksum when running IPS and offloading and so on. If you dont encounter performance drops just ignore them or report to FreeBSD directly
This issue has been automatically timed-out (after 180 days of inactivity).
For more information about the policies for this repository, please read https://github.com/opnsense/core/blob/master/CONTRIBUTING.md for further details.
If someone wants to step up and work on this issue, just let us know, so we can reopen the issue and assign an owner to it.
Important notices Before you add a new report, we ask you kindly to acknowledge the following:
[X] I have read the contributing guide lines at https://github.com/opnsense/core/blob/master/CONTRIBUTING.md
[X] I have searched the existing issues and I'm convinced that mine is new.
Describe the bug Creating and using a LACP LAGG interface causes errors on the LAG. After rebooting the machine, the LAG is not usable. the physical connections need to be replugged.
To Reproduce Create The LAGG and check the error counter (no physical link attached)
root@OPNsense:~ # netstat -i Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll lagg0 1500 <Link#11> 00:1b:21:a7:5b:f2 0 0 0 0 5 0 lagg0 - fe80::%lagg0/ fe80::21b:21ff:fe 0 - - 2 - -
Then in moved a vlan interface to the lagg, send some traffic over it and plugged and unplugged the physical links one after another.
root@OPNsense:~ # netstat -i Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll lagg0 1500 <Link#11> 00:1b:21:a7:5b:f2 9843 0 0 4232 34 0 lagg0 - fe80::%lagg0/ fe80::21b:21ff:fe 0 - - 2 - -
Then i rebooted the switch and the opnsense and testet again. After reboot I noticed that the link did not work. I had to unplug both physical cables and replugg them.
This is what the error counters looked like after sending some traffic through again:
root@OPNsense:~ # netstat -i Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll lagg0 1500 <Link#11> 00:1b:21:a7:5b:f2 12385 0 0 7016 82 0 lagg0 - fe80::%lagg0/ fe80::21b:21ff:fe 0 - - 2 - -
root@OPNsense:~ # netstat -i Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll lagg0 1500 <Link#11> 00:1b:21:a7:5b:f2 13326 0 0 8134 135 0 lagg0 - fe80::%lagg0/ fe80::21b:21ff:fe 0 - - 2 - -
With or without VLAN hardware filtering the same thing happens.
Expected behavior A LACP link should be usable without any errors
Screenshots
Creating LAGG: https://ibb.co/HqGd9s9
Configure vlan to lagg parent if: https://ibb.co/1JCj1jR
Interface settings: https://ibb.co/VWqg2hW
lacp switch configuration 1/2: https://ibb.co/Ptd7fJQ
lacp switch configuration 2/2: https://ibb.co/G0WwL6L
switch interface stats before connecting physical lacp links to opnsense: https://ibb.co/2hj65vR
switch interface stats after connecting physical lacp links to opnsense: https://ibb.co/dKZcyRN
Relevant log files If applicable, information from log files supporting your claim.
Additional context No errors appear on the Switch. This error does not appear using LACP on Ubuntu Server 19.10 Kernel 5.3 so i guess it is no Hardware related issue. This occurs even with completely different hardware (Sophos XG 105 rev2).
Environment Versions OPNsense 20.1-amd64 FreeBSD 11.2-RELEASE-p16-HBSD OpenSSL 1.1.1d 10 Sep 2019
Intel Xeon E3-1220v6 Intel i340-T4 Gigabit Nic
Switch:
Device Information Device Type DGS-1210-26 Gigabit Ethernet Switch Boot Version 1.00.010 Firmware Version 6.12.B006 Hardware Version F1
NIC:
root@OPNsense:~ # pciconf -l -BbceVv igb2@pci0:1:0:2: class=0x020000 card=0x12a18086 chip=0x150e8086 rev=0x01 hdr=0x00 vendor = 'Intel Corporation' device = '82580 Gigabit Network Connection' class = network subclass = ethernet bar [10] = type Memory, range 32, base 0xde180000, size 524288, enabled bar [1c] = type Memory, range 32, base 0xde304000, size 16384, enabled cap 01[40] = powerspec 3 supports D0 D3 current D0 cap 05[50] = MSI supports 1 message, 64 bit, vector masks cap 11[70] = MSI-X supports 10 messages, enabled Table in map 0x1c[0x0], PBA in map 0x1c[0x2000] cap 10[a0] = PCI-Express 2 endpoint max data 256(512) FLR NS link x4(x4) speed 5.0(5.0) ASPM disabled(L0s/L1) ecap 0001[100] = AER 1 0 fatal 0 non-fatal 1 corrected ecap 0003[140] = Serial 1 001b21ffffa75bf0 ecap 0017[1a0] = TPH Requester 1 PCI-e errors = Correctable Error Detected Unsupported Request Detected Corrected = Advisory Non-Fatal Error igb3@pci0:1:0:3: class=0x020000 card=0x12a18086 chip=0x150e8086 rev=0x01 hdr=0x00 vendor = 'Intel Corporation' device = '82580 Gigabit Network Connection' class = network subclass = ethernet bar [10] = type Memory, range 32, base 0xde100000, size 524288, enabled bar [1c] = type Memory, range 32, base 0xde300000, size 16384, enabled cap 01[40] = powerspec 3 supports D0 D3 current D0 cap 05[50] = MSI supports 1 message, 64 bit, vector masks cap 11[70] = MSI-X supports 10 messages, enabled Table in map 0x1c[0x0], PBA in map 0x1c[0x2000] cap 10[a0] = PCI-Express 2 endpoint max data 256(512) FLR NS link x4(x4) speed 5.0(5.0) ASPM disabled(L0s/L1) ecap 0001[100] = AER 1 0 fatal 0 non-fatal 1 corrected ecap 0003[140] = Serial 1 001b21ffffa75bf0 ecap 0017[1a0] = TPH Requester 1 PCI-e errors = Correctable Error Detected Unsupported Request Detected Corrected = Advisory Non-Fatal Error