opnsense / core

OPNsense GUI, API and systems backend
https://opnsense.org/
BSD 2-Clause "Simplified" License
3.35k stars 752 forks source link

ix0 no carrier #2591

Closed abplfab closed 5 years ago

abplfab commented 6 years ago

After upgrading to opnsense 18.7 the ix NIC (attached with a DAC to a switch) reports "media: no carrier". Setting the media to fixed 10Gbase-Twinax doesn't help...

RehaagJ commented 6 years ago

Do I understand correctly that the new ix driver will still be available in 18.7.1? Meaning it should be safe to update on a Denverton system (having worked around the AHCI issue with BIOS settings)?

fichtner commented 6 years ago

Yes.

fichtner commented 6 years ago

This sounds promising: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221146

I'll provide a test kernel in a bit...

fichtner commented 6 years ago

Relevant src commit: https://github.com/opnsense/src/commit/57b43db5

How to install test kernel from 18.7:

# opnsense-update -kr 18.7.1-ix
fichtner commented 6 years ago

And one more mention via Twitter for affected hardware: https://www.supermicro.com/products/motherboard/Xeon/D/X10SDV-4C_-TLN4F.cfm

enoch85 commented 6 years ago

Don't know if it helps but here are some information on my failing ix NICs on 18.7:

ix0: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 3.1.13-k> mem 
ix1: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 3.1.13-k> mem 

I'm @techandme btw.

fichtner commented 6 years ago

No tests with the proposed kernel after all this discussion? ;)

enoch85 commented 6 years ago

I'd love to test, but since my system is in production and I don't have a any test system, I'd prefer not to.

Would be nice if someone could confirm that it works. Right now I'm stuck with 18.1 because of this. :/

RehaagJ commented 6 years ago

I can try to test it during the weekend on a Denverton Atom system, but that cannot verify that the problem is solved (I never had that no carrier issue), it can only verify that the driver still works on Denverton. Is such a test useful for you?

fichtner commented 6 years ago

@Tsuroerusu and @abplfab were able to reproduce, their testing would be most useful, but testing that it works either way (release and test kernel) is good to know too

Thanks all. I'll be gone for two weeks so apologies for lack of responses.

Cheers, Franco

Tsuroerusu commented 6 years ago

@fichtner During the weekend, I will see if I can image one of my firewall nodes and try this out on that one. My situation is the same as @Uica , my systems are production so I have a limited scope to test things, but I can do so on my secondary box after imaging it.

RehaagJ commented 6 years ago

Tested, both release and test kernels are working on Denverton Atom C3558. But I repeat, this does not prove that the problem is fixed, only that I don't see any regression.

Tsuroerusu commented 6 years ago

Okay, this is the situation: Both my primary and secondary nodes in production have been running kernel.old for 17 days to alleviate the problem of VLAN interfaces on ix1 showing "no carrier", and thus not functioning. As I have stated earlier, my ix0 interfaces both nodes have no VLANs and come up fine, which means that there is WAN access. My ix interfaces are located on a Supermicro AOC-STGN-i2S add-on card, which uses the Intel 82599ES controller.

Today, I performed the following on my secondary node in production:

  1. I took a bit-for-bit image of my secondary node, and shut down the machine.

  2. After once again powering on the machine, I let it boot directly into the 18.7 kernel, and the problem reappeared.

  3. I then did a reboot to make sure that it was not just a fluke, and the problem did indeed persist after rebooting.

  4. I then updated the secondary node to 18.7.1 and after rebooting the problem still persisted.

  5. I then logged into the system via the IPMI KVM console, and ran: opnsense-update -kr 18.7.1-ix

After it finished installing the kernel package, I immediately rebooted the machine. After rebooting, the problem is still present. :-/ I also tried a second reboot, which had the same result, as well as a power cycle and it also had the same result.

To conclude: As far as my problem of VLAN interfaces on ix1 showing "no carrier" is concerned, the "18.7.1-ix" package had no effect at all.

@fichtner @enoch85

Tsuroerusu commented 6 years ago

Does anybody need me to check anything before I restore my node to its earlier semi-working (ie. kernel.old) state?

fichtner commented 6 years ago

Thanks for testing. No further poking needed. As said on Twitter kernel.old will not work because 18.7.1 installed a newer kernel making 18.7 kernel.old, please use the last hint in this thread to install the 18.1.11 kernel directly.

On 19. Aug 2018, at 12:52, Troels Just notifications@github.com wrote:

Does anybody need me to check anything before I restore my node to its earlier semi-working (ie. kernel.old) state?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

enoch85 commented 6 years ago

@Tsuroerusu Thanks for confirming!

abplfab commented 6 years ago

Just tried with 18.7.1_3, still no carrier. Supermicro & NetGear still investigating...

mimugmail commented 6 years ago

@enoch85 On my side it's

ix1: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 3.2.12-k>

enoch85 commented 6 years ago

@mimugmail Newer, and still not working?

mimugmail commented 6 years ago

Sure, it was always working for me with every version.

mimugmail commented 6 years ago

Guys, can you try to add this (or create the file if not exist):

root@OPN:~ # cat /boot/loader.conf.local hw.ix.allow_unsupported_sfp=1

fichtner commented 6 years ago

PS: can also be added via system: settings: tunables.

abplfab commented 6 years ago

In my case doesn't help :(. Also tried hw.ix.allow_unsupported_sfp=1,1

Tsuroerusu commented 6 years ago

@mimugmail @fichtner The proposed change made no difference in regard to the problem I am seeing.

mimugmail commented 6 years ago

And you driver versions are 3.1.13 or 3.2.12?

Tsuroerusu commented 6 years ago

@mimugmail The one that is in OPNsense 18.7 / 18.7.1

mimugmail commented 6 years ago

And why is mine higher than others?

enoch85 commented 6 years ago

And why is mine higher than others?

Maybe because you upgraded with this?

abplfab commented 6 years ago

&%%+"ç"&&/"+%%?!*"/ç DAMN!

I've found the problem. I replaced the Delock SFP+ DAC (Art. 86234) with an Intel XDACBL3M and it works.

enoch85 commented 6 years ago

No need for hw.ix.allow_unsupported_sfp=1,1

So that will make the Intel NIC work?

abplfab commented 6 years ago

In my case it didn't change anything if I have hw.ix.allow_unsupported_sfp=1,1 or not. Means: With the Delock cable it doesn't work with hw.ix.allow_unsupported_sfp=1,1 enabled or disabled (always no carrier). With the Intel cable I didn't try to enable it, as it works without it.

enoch85 commented 6 years ago

@abplfab So this could be a cabling issue as well?

@fichtner Any progress on this in the latest releases? I'm stuck on 18.1 because of this. :(

Tsuroerusu commented 6 years ago

@enoch85 My problem is most certainly not cable-related, because everything was working wonderfully for over a year before the 18.7 update. Like you I am also stuck with the 18.1 kernel, which is a big issue as there have been several security updates since then.

As I stated e arlier, when I ran vanilla FreeBSD 11.2 through the LiveUSB, I was able to configure VLANs without an issue, so it seems like its something wrong with the backported driver. So I am crossing my fingers that 19.1 will be re-based to FreeBSD 11.2 and that, that will fix the issue. ( @fichtner )

abplfab commented 6 years ago

@enoch85 / @Tsuroerusu Definitly. With the old driver / Kernel it was working well with Delock cable. After upgrading to 18.7 it stopped working with the delock cable. Intel cable works with 18.1 / 18.7 without any modification.

enoch85 commented 6 years ago

@abplfab Do you think something like this work?

Has anyone tried with the 19.1 release?

abplfab commented 6 years ago

@enoch85 could be. But I wouldn't save on cables anymore... Take the original Intel cable (XDACBL3M for 3m) if possible.

cvbkf commented 5 years ago

Hi there, i have the same problem with an Intel X520-DA2 with direct-attach cables. ix0 is offline with no carrier, ix1 is working. if i boot the system to a ubuntu live cd, both ports are working with 10 gbit/s. the switch on the other end says link is fine with 10 gbit/s, but does not receive any packets

OPNSense ist the latest version with intel driver 3.2.12k. i found this related bug: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221146

Tsuroerusu commented 5 years ago

After this whole fiasco, which has left my perimeter firewalls without security updates for several months now, I am rather curious as to whether 19.1 will have any backports of Intel drivers or will it be using the ones in FreeBSD 11.2 ? Hey @fichtner , do you have any information in regard to this?

fichtner commented 5 years ago

has left my perimeter firewalls without security updates for several months now

Sorry, that's a very one-sided approach. People have replaced hardware, cables, tried patches. There's also 19.1-BETA.

If you want to blame OPNsense for any of this, please do while presenting the hard evidence and the intent to leave you without security updates. Looking forward to it? :)

As for 19.1 we are proceeding as planned: January 2019. You can try the BETA from here:

https://forum.opnsense.org/index.php?topic=10135.0

Tsuroerusu commented 5 years ago

has left my perimeter firewalls without security updates for several months now

Sorry, that's a very one-sided approach. People have replaced hardware, cables, tried patches. There's also 19.1-BETA.

If you want to blame OPNsense for any of this, please do while presenting the hard evidence and the intent to leave you without security updates. Looking forward to it? :)

As for 19.1 we are proceeding as planned: January 2019. You can try the BETA from here:

https://forum.opnsense.org/index.php?topic=10135.0

No offense, but WTF!? How in the world did you manage to interpret my remarks as somehow being aggressive and blaming OPNsense on a sort of personal level? I simply said, that this problem has had the consequence of me being unable to update my firewalls for several months, because doing so would cripple my network setup. I never said anything that this was somehow a conspiracy. Frankly, given the amount of time I spent earlier in this discussion trying to help test things, I am slightly ... not offended, but let's say "disappointed" that you would interpret my concern in such a manner.

Also, with all due respect, I use a Supermicro network card (AOC-STGN-i2S) in a system with a Supermicro motherboard (A1SRM-2758F) connected using Supermicro SFP+ DAC cables (CBL-NTWK-0347). The notion that hardware is somehow the problem in my case I think is, frankly, absurd. As I have stated previously, it was all working brilliantly with 18.1 and earlier releases. This all started with 18.7. Frankly, since you mention hard evidence, well there you go, 18.1 worked perfectly, 18.7 does not, and do you honestly expect that I should consider hardware incompatibility as a serious possibility when, 1. it was working fine before 18.7, and 2. I am not mixing brands at all, I am using hardware that is all verified to work together as it is all from Supermicro.

I really do hope that you simply misunderstood what I meant to say, because I really think your remarks were rather unfortunate.

abplfab commented 5 years ago

@Tsuroerusu give the Intel DAC (XDACBL3M) a try. I'm 98.6% sure that this will solve your problems.

enoch85 commented 5 years ago

give the Intel DAC (XDACBL3M) a try. I'm 98.6% sure that this will solve your problems.

I just bought one, will be delivered next week. Let's hope it works!

fichtner commented 5 years ago

I really do hope that you simply misunderstood what I meant to say, because I really think your remarks were rather unfortunate.

Likewise.

Tsuroerusu commented 5 years ago

I really do hope that you simply misunderstood what I meant to say, because I really think your remarks were rather unfortunate.

Likewise.

I would be happy to find out that I misunderstood you, however I interpreted a good sense of contempt and arrogance in this remark of yours: "If you want to blame OPNsense for any of this, please do while presenting the hard evidence and the intent to leave you without security updates. Looking forward to it? :)"

I never blamed you or anybody else personally, and I have have tested everything I described and have not just made blind claims. And, please, why did you feel the need to end with a ":)", no offense, but I do not think I can be faulted for interpreting that as contempt directed towards me.

If I am mistaken, and you did not mean it like that, then let's just, virtually, shake hands and and move on. As I said, I really would appreciate to know whether driver backports is going to be a part of 19.1, or whether it will be just the FreeBSD 11.2 drivers, I ask because I tried running the FreeBSD 11.2 live media on my firewalls, and configured VLANs on the ix interfaces and it worked as expected. So my suspicion is that something about the driver backport in 18.7 caused the problem, but I cannot confirm that, and I am crossing my fingers that 19.1 resolves the issue.

Tsuroerusu commented 5 years ago

@Tsuroerusu give the Intel DAC (XDACBL3M) a try. I'm 98.6% sure that this will solve your problems.

No offense, but why should it be necessary for me to spend several hundred euros on cables, when the ones I have (From my system manufacturer) were working perfectly fine for over a year? It was my upgrade to 18.7 that caused the problem, not the hardware.

mimugmail commented 5 years ago

@Tsuroerusu no, it wasn't the update of OPNsense, it was the update of FreeBSD kernel. OPNsense is just the GUI orchestrating all services and configuration. Be sure that many many companys run 10G and DA cables just fine. The problem is that for developers it's hard to replicate since they wont buy such a hardware to figure out the problem.

Be sure you misunderstood Franco .. let's wait for @enoch85 how it works with new cable :)

I have many 10G cards, also 40G, many Gbics and DA cables .. I never had a problem.

Tsuroerusu commented 5 years ago

@Tsuroerusu no, it wasn't the update of OPNsense, it was the update of FreeBSD kernel. OPNsense is just the GUI orchestrating all services and configuration. Be sure that many many companys run 10G and DA cables just fine. The problem is that for developers it's hard to replicate since they wont buy such a hardware to figure out the problem.

Be sure you misunderstood Franco .. let's wait for @enoch85 how it works with new cable :)

I have many 10G cards, also 40G, many Gbics and DA cables .. I never had a problem.

With all due respect, that is obviously what I meant! You can say that one specific component is "OPNsense", fine, but then why is the ISO file named "opnsense" ? Obviously, because it is a, for a lack of a better term, "distribution", so when I say that the "update" for "OPNsense 18.7" caused the problem, I mean that the kernel provided in 18.7 which got installed as part of the update caused the problem, thus the problem was caused by "18.7".

I am a user, not a developer, I will gladly help out to the degree that my skill set allows me to be helpful, but apparently, I cannot simply describe the things I am experiencing from my perspective, apparently, I have to learn whatever jargon/lingo is used between developers before I am even welcomed to say anything. No offense, but do you not feel that, that is just a tiny bit absurd?

And I am very sympathetic to the fact that developers cannot test every single hardware combination! Why do you think I actually dared to risk system uptime to test things a few months ago (i.e. earlier in this discussion) ? Precisely because I know that developers cannot go and buy specific hardware to test things.

Indeed, I expect that thousands of servers and firewalls are using Supermicro's cables, and so was I, that was my entire point, but after upgrading to 18.7, "no carrier" was all I got when using VLANs.

fichtner commented 5 years ago

@Tsuroerusu

You have your own reasons for using the hardware you use. That's fine.

You don't agree with the stance that an open source project has regarding your interests? That's also fine.

But using this platform to justify your stance is suboptimal at best.

You are the sole person in charge of your hardware and software.

So I'm politely asking you to quit this discussion now.

Thank you for your understanding.

fichtner commented 5 years ago

Closing this as there were no negative updates on 19.1-BETA.

Tsuroerusu commented 5 years ago

@fichtner I am using the hardware that I have, because it was working just fine with 18.1, I also said that I fully understand why developers cannot test my particular setup. The only thing I insisted on is that I am not using incompatible hardware as everything I am using is officially validated by Supermicro, WHICH WAS WORKING WITH 18.1, and thus my modest claim was that it was unreasonable for you guys to keep telling me that I just need to change my hardware and spend hundreds of euros on that.

Why do you keep putting words into my mouth? I never said anything about the "stance" of the OPNsense project, all I wanted to ask was about whether there were driver backports planned in 19.1, that was all!

I am not using this platform for anything! I simply asked a question about drivers, and then I ended up responding to the absurd accusations that you made against me, how is that unreasonable?

I fully accept that I am responsible for my hardware.

I am, frankly, a little bit shocked that this is how you choose to treat your users, who, 1. have reported a problem, 2. tried to help, and 3. just asked a question.

But do rest assured, I can promise you that I have no intention of participating in this discussion anymore, and I will not be reporting any problems in the future given that I simply got insulted and shown contempt for simply asking a question.

I wish you and everybody else a pleasant Sunday. Good bye.