opnsense / core

OPNsense GUI, API and systems backend
https://opnsense.org/
BSD 2-Clause "Simplified" License
3.36k stars 753 forks source link

ix0 no carrier #2591

Closed abplfab closed 5 years ago

abplfab commented 6 years ago

After upgrading to opnsense 18.7 the ix NIC (attached with a DAC to a switch) reports "media: no carrier". Setting the media to fixed 10Gbase-Twinax doesn't help...

fichtner commented 5 years ago

@Tsuroerusu Okay, listen. Tell me what you in very precise words want us to do and I'll objectively explain why that may or may not be feasible.

Tsuroerusu commented 5 years ago

@Tsuroerusu Okay, listen. Tell me what you in very precise words want us to do and I'll objectively explain why that may or may not be feasible.

This is the heart of the matter, Franco. I did not, and still do not actually want you to do anything, and that is why I have been so amazed (in the negative sense) by your responses today (specifically).

The ONLY thing that I requested was a simple "yes/no" answer to this question: Will OPNsense 19.1 contain any backported Intel drivers?

enoch85 commented 5 years ago

@Tsuroerusu I understand that you're frustrated. Believe me, I'm too! But instead of arguing, why not test the latest 19.1 release to see if it works? That would help all users far more IMHO.

Maybe I can save a few bucks on a new cable. :D

Tsuroerusu commented 5 years ago

@Tsuroerusu I understand that you're frustrated. Believe me, I'm too! But instead of arguing, why not test the latest 19.1 release to see if it works? That would help all users far more IMHO.

Maybe I can save a few bucks on a new cable. :D

@enoch85 I actually agree with you on that, and that is why I was so disappointed that instead of a simple answer to the question I asked, I got absurd accusations thrown at me (not by you), which I then had to respond to.

Your suggestion of testing the 19.1 beta, I have no issue with considering that, it is a perfectly reasonable thing to ask of me. In fact, let me just go further and say, that the reason I asked about the drivers in 19.1 was precisely because I was interested in potentially testing it with my setup, however for that it would be useful to know which drivers I would actually be testing (Because earlier I got the vanilla FreeBSD 11.2 live media to work fine).

My setup is at production-level and because of that, I have to image it before testing things, and before spending an hour or two doing that, I simply needed some information as I have explained.

cvbkf commented 5 years ago

Some update from me, i tried the current 19.1 beta image via live cd, and both ports of the intel x520-da2 are online now.

the driver version is the same as before:

[1] ix0: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 3.2.12-k> port 0xc020-0xc03f mem 0xdfe80000-0xdfefffff,0xdff04000-0xdff07fff irq 22 at device 0.0 on pci4

I'll install the beta now, hopefully config from 18.7 can be imported.

edit: performance is really bad, iperf3 measures around 1.2 gbit/s while hitting massive iowait.

fichtner commented 5 years ago

Will OPNsense 19.1 contain any backported Intel drivers?

The answer is no, but not in the sense that the backported drivers of 18.7 are not different from 19.1, because the backport is from FreeBSD 11.2 and 19.1 is based on stock HardenedBSD/FreeBSD 11.2 so the same drivers are included.

I was under the impression that this has been communicated clearly in several of places and I apologise if that wasn't actually the case.

Tsuroerusu commented 5 years ago

Will OPNsense 19.1 contain any backported Intel drivers?

The answer is no, but not in the sense that the backported drivers of 18.7 are not different from 19.1, because the backport is from FreeBSD 11.2 and 19.1 is based on stock HardenedBSD/FreeBSD 11.2 so the same drivers are included.

Thank you for answering my question. :)

I was under the impression that this has been communicated clearly in several of places and I apologise if that wasn't actually the case.

Earlier you linked to a forum post, which stated the following: "The key difference is the operating system switch from FreeBSD 11.1 to HardenedBSD 11.2."

However, for me, that did not exclude the possibility of any potential driver backports from 11-STABLE or from Intel directly, so that was why I sought clarification from you regarding that point.

The reason why I was interested in this is that shortly after I started experiencing problems with 18.7, I tried booting up the FreeBSD 11.2 live USB media, and configured my ix interfaces with VLANs and that worked in the way I expected them to. So my thinking was that if 19.1 did not contain any modifications to the ixgbe driver compared to the one in FreeBSD 11.2, perhaps it would fix my problem.

I hope this makes sense from my perspective of being a user/sysadmin, and not developer, just trying to use pattern matching and logical inferences to try to find solutions.

enoch85 commented 5 years ago

The cable won't show up until 2018-12-13 , so I won't be able to test if it's a cabling issue or not until then.

Can someone else please confirm that 19.1 works?

cvbkf commented 5 years ago

19.1 (installed from ISO) works for me without changing the cables.

interestingly, i switched back to 18.7 by doing a fresh install and found the following: running in live cd mode: ix0 and ix1 are working. the first boot of the fresh install from disk: ix0 & ix1 online. but if i restore a backup, or change the interface settings via GUI ix0 "dies" after the reboot with the ominous "no carrier" status. ix1 stays online.

mimugmail commented 5 years ago

Can you make a diff of config.xml and your backup?

enoch85 commented 5 years ago

@cvbkf

Can you make a diff of config.xml and your backup?

Please

I will get the cable tomorrow, so I will test this weekend. :D

enoch85 commented 5 years ago

IT WORKS!

What I did:

  1. Replaced the cable and did a live boot with 18.7 a) Got a warning that my WAN interface didn't work b) Renewed the IP --> Worked c) Noticed that I couldn't reach the servers sites from my WIFI net, but it was reachable from LTE (or outside of my own network)

  2. Figured I could live with not reaching the DNS for the sites I host (could just use network outside my own, or thought it might be a firewall rule issue) and went ahead and installed 18.7 a) After installation I changed back to LibreSSL (which I had originally) b) Ran an update to reach 18.7.8

  3. Rebooted --> MISSION SUCCESS, everything now works as expected and I can reach my sites from my own network again. Everything regarding ix1 is "up" and so far I have no issues.

Conclusion Maybe it was a combined error with my SFP+ cable and that something got fixed in the update to 18.7.6. Anyway, I'm happy again since I now can use the latest stable release. :D

cvbkf commented 5 years ago

I am on 19.1-rc1 for a few weeks now, until today both ix0 and ix1 were fine. After a reboot ix0 didn't come online -> the good old "no carrier" problem is back to haunt me. I tried a few things, but for now ix0 stays dead.

But, i made an interesting discovery: If i reset the machine via reset button, the link comes up at 10 Gbit/s - then, the link stays online until "Configuring LAN interface..." at the OPNSense bootup, then it goes down until the next reboot.

Maybe there is some invalid config applied ? Are there any files or something i can provide ?

Tsuroerusu commented 5 years ago

I just upgraded one of my firewalls to 19.1.5 from 18.7 (but using "kernel.old", i.e. the kernel fra 18.1), which I had been running since August because of the issue I had with VLANs on my ix interfaces saying "no carrier", as described earlier in this thread. However, I am sad to say that the issue still persists despite the update to FreeBSD 11.2 in OPNsense, which is really depressing :-(

The strange thing for me is that, as I mentioned before, when 11.2 came out, I tried using the Live boot with my machines, and I could configure VLANs without any issue and run network traffic through them. Which makes this issue even more baffling to me.

And before anybody jumps in with this. Before I went to do the upgrade, the firewalls were working fine with the kernel from 18.1. So this is not a cabling issue, unless the newer Intel drivers have some change that in itself causes compatibility issues with the Supermicro cables that I am using.

At this point, I am millimeters from giving up and buying some different NIC card, and hoping that I will not face the same issue, because at this point I have been without security updates since August. Unfortunately, that will cost me 500 euros before I even know whether it will actually solve the problem.

cvbkf commented 5 years ago

I switched to a Mellanox Connect-X3 EN 2x SFP+ (used), which is working without any flaws.

You could try to compile a newer version of the intel driver (last post in this thread) https://forum.opnsense.org/index.php?topic=11384.0;topicseen

Tsuroerusu commented 5 years ago

I switched to a Mellanox Connect-X3 EN 2x SFP+ (used), which is working without any flaws.

You could try to compile a newer version of the intel driver (last post in this thread) https://forum.opnsense.org/index.php?topic=11384.0;topicseen

Thanks for the suggestion, I appreciate that. Unfortunately, the newer driver has the same issue. Are you using VLANs with that Mellanox card?

cvbkf commented 5 years ago

yes, i do, but just only one (which lead to the "no carrier" problem on the intel x550)

Tsuroerusu commented 5 years ago

yes, i do, but just only one (which lead to the "no carrier" problem on the intel x550)

I must say, the ix driver sure is a wuss, 'ey? A single VLAN, in your case, and it melts down! It would be funny if it wasn't so annoying. I run something like 8 VLANs and I REALLY need them to work, so Mellanox it is for me then. Thanks for the recommendation, as I think I have found a good source for them! Hopefully this will solve my problem. :-)

DesruX commented 5 years ago

I switched to a Mellanox Connect-X3 EN 2x SFP+ (used), which is working without any flaws.

You could try to compile a newer version of the intel driver (last post in this thread) https://forum.opnsense.org/index.php?topic=11384.0;topicseen

Mention of no carrier bug - recommending the 3.3.6 driver https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235918

This may also be related https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221967

Talk of permanent allow unsupported SFP in driver, dated but similar principle. https://sourceforge.net/p/e1000/mailman/message/28694855/

Another patch for ix driver: http://www.grosbein.net/freebsd/patches/patch-if_ix.c

Would be great if no artificial restrictions where present in the driver and we only have to worry about actual hw compatibility.

abplfab commented 4 years ago

With 20.1 the problem is back. "no carrier" on the 10Ge I/F. :(

abplfab commented 4 years ago

Booting kernel.old (19.7) doesn't help.

mimugmail commented 4 years ago

Sounds occasional, what happens when plugging of the cable and in again?

abplfab commented 4 years ago

Doesn't help. Exactly the same behavior as in the beginning of this thread. Hardware unchanged, "only" updated to 20.1.

mimugmail commented 4 years ago

ifconfig -vvvvvv please

It worked with 19.7.10?

abplfab commented 4 years ago

Yes with 19.7.10 it worked.

ix0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=e407bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6> ether 0c:c4:7a:97:ed:ce hwaddr 0c:c4:7a:97:ed:ce nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> media: Ethernet autoselect status: no carrier plugged: SFP/SFP+/SFP28 Unknown (Copper pigtail) vendor: Panduit Corp. PN: PSF1PXA3MBLLN SN: 15219315U-0017B DATE: 2017-01-10 Class: (null) Length: (null) Tech: Passive Cable Media: (null) Speed: (null)

    SFF8472 DUMP (0xA0 0..127 range):
    03 04 21 00 00 00 00 00 04 00 00 00 67 00 00 00
    00 00 03 00 50 61 6E 64 75 69 74 20 43 6F 72 70
    2E 20 20 20 00 00 0F 9C 50 53 46 31 50 58 41 33
    4D 42 4C 4C 4E 20 20 20 34 20 20 20 01 00 00 F8
    00 00 00 00 31 35 32 31 39 33 31 35 55 2D 30 30
    31 37 42 20 31 37 30 31 31 30 00 00 00 00 00 71
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
fichtner commented 4 years ago

19.7.x OS = 20.1.x OS with no modifications. It does not look like this is a software issue if it works sometimes, but not always.

abplfab commented 4 years ago

19.7.x always stable. After upgrading to 20.1.x no chance to get it working.

fichtner commented 4 years ago

It’s probably a boot timing issue then. On 19.7 the netgraph drivers are loaded, on 20.1 not. See https://forum.opnsense.org/index.php?topic=15653.0

I can’t think of anything else that would make sense software-wise.

On 31. Jan 2020, at 10:04, Fabian Abplanalp notifications@github.com wrote:

 19.7.x always stable. After upgrading to 20.1.x no chance to get it working.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

AdSchellevis commented 4 years ago

other sfp modules (or cables) very often help fix these kind of issues in our experience, often unstable connections point to issues there. The connector contains the transceiver, which is responsible for the connection (some cards even check for specific firmware in the transceiver).