mcusim / freebsd-src

sys/dev/dpaa2 drivers work-in-progress
https://www.FreeBSD.org/
Other
4 stars 3 forks source link

ten64: No dataflow on boot until cable replugged #21

Closed mcbridematt closed 8 months ago

mcbridematt commented 11 months ago

FYI: I am seeing this issue (as of a85d6c9ad5fe4de8cb3bc651253a1717fb28505c) as well. It may have been around for longer, but hard to tell apart from the other dataflow issues,

No dataflow (in ten64's 'managed' mode) until I remove and reconnect Ethernet cables. If I have some time on the weekend, I will try and do some debugging

          I'm not sure whether it's the same issue, but I can't do any traffic after booting until I replug the network cables. If I just unplug them and immediately plug in, traffic starts working.

Originally posted by @pkubaj in https://github.com/mcusim/freebsd-src/issues/18#issuecomment-1634525671

dsalychev commented 11 months ago

OK, I've reproduced it on my side as well.

snail59 commented 11 months ago

my 2 cents: I have the same problem

pkubaj commented 11 months ago

I updated FreeBSD to have the fix for https://github.com/mcusim/freebsd-src/issues/19. Now the link is correctly detected, but replugging doesn't help - there's no traffic at all.

dsalychev commented 11 months ago

@pkubaj I see. It makes this bug a priority then :(

dsalychev commented 11 months ago

@mcbridematt @pkubaj @snail59 Could you give https://github.com/mcusim/freebsd-src/commit/c9a114330dd33ba25ca5aa9586b77290f8dbd9fc a try? EDIT: Please, drop me dmesg output in case it won't help.

mcbridematt commented 11 months ago

@mcbridematt @pkubaj @snail59 Could you give c9a1143 a try? EDIT: Please, drop me dmesg output in case it won't help.

Did not fix the problem, sadly :(

dmesg here. dpni6 is the the interface I am trying to use

https://gist.github.com/mcbridematt/56ac5ab3d2d39627ce84d4c738526e62

dsalychev commented 11 months ago

@mcbridematt Negative result is a result as well :) Thanks for checking it.

pkubaj commented 11 months ago

I succeeded in getting a network connection by plugging the cable after booting. The connection itself seems stable - I was checking with parallel iperf3 (up to 8 datastreams).

dsalychev commented 11 months ago

Could you show me an output of cat /dev/fsl_mc_console as well?

pkubaj commented 11 months ago
[W, PLATFORM]  MC uses small memory footprint
[W, PLATFORM]  The UART Console is enabled and the commands take longer time. Set CONSOLE_MODE_OFF in DPC file to disable it!
[W, QBMAN]  Set FBPR size to 4Mb
[W, QBMAN]  Set PFDR size to 16Mb
Running MC app, waiting for events ...
dch commented 10 months ago

Same here, reboots ok but all physical ports need to be re-attached before coming up. Other than that, great progress thanks @dsalychev!

my console is the same:

# cat /dev/fsl_mc_console
[W, PLATFORM]  MC uses small memory footprint
[W, PLATFORM]  The UART Console is enabled and the commands take longer time. Set CONSOLE_MODE_OFF in DPC file to disable it!
[W, QBMAN]  Set FBPR size to 4Mb
[W, QBMAN]  Set PFDR size to 16Mb
Running MC app, waiting for events ...
dsalychev commented 10 months ago

@dch and everyone else,

I understand that the issue is irritating, but https://github.com/mcusim/freebsd-src/issues/19 is still a priority for me. Could you high-load your Ten64 boards and double-check that the 15.0-CURRENT can process all of the traffic without panics, please?

I'll try to find time for this issue with @bzfbd's help (hopefully!) later.

dsalychev commented 8 months ago

@dch, @pkubaj, @snail59, @mcbridematt Could you look for dpaa2_macX: dpaa2_mac_intr: status=0xXX with the latest dpaa2 branch? I'd like to understand which interrupt flags are raised for the DPMACs in a normal case and in case of no dataflow.

UPD: I forgot to mention that you'll have to boot kernel in verbose mode.

snail59 commented 8 months ago

Hello Dmitry,

Just saw your request. I can't test this because the commit 03ce101314a6b9f9f4ad2341be29c6a257f213e0 was reverted in this branch. And without it, my ten64 does not boot :-/ May I/shoud I re-revert this commit ?

dsalychev commented 8 months ago

@snail59 hm, shouldn't be the case. Could you post dmesg?

snail59 commented 8 months ago

Maybe I confused with issue #20. I will try, but will wait for tomorrow. Otherwise, my family could get angry in case of a problem occuring :-D.

snail59 commented 8 months ago

@dsalychev Hello, I did not use your branch but releng/14.0 onto which I cherry-picked your commits ( I wanted to stay as close as possible at a release). I used this network configuration:

ifconfig_dpni5="inet 192.168.5.132/24 up"

I tried with a vlan too.

== RESULTS ==

dmesg | grep dpaa2_mac_intr
dpaa2_mac1: dpaa2_mac_intr: status=0x4
dpaa2_mac6: dpaa2_mac_intr: status=0x4

After unplugging/replugging the cable, this line appears in dmesg and the network starts working: dpaa2_ni5: dpaa2_ni_intr: status=0x1

dsalychev commented 8 months ago

@snail59 OK, this is what I've expected. It looks like there was an IRQ for DPMAC (DPMAC_IRQ_EVENT_LINK_UP_REQ) generated initially, but similar IRQ for DPNI (DPNI_IRQ_EVENT_LINK_CHANGED) - was not. I'll try to patch it then.

snail59 commented 8 months ago

@snail59 OK, this is what I've expected. It looks like there was an IRQ for DPMAC (DPMAC_IRQ_EVENT_LINK_UP_REQ) generated initially, but similar IRQ for DPNI (DPNI_IRQ_EVENT_LINK_CHANGED) - was not. I'll try to patch it then.

Know that I trust you completely but I must humbly admit that I do not fully understand the implication of your words :-D. Does REQ in DPMAC_IRQ_EVENT_LINK_UP_REQ mean REQUEST ? I guess a signal indicating the upping of the link was not detected or interpreted. Am I right ? I would like to understand, out of curiosity.

dsalychev commented 8 months ago

@snail59 Could you try this patch https://github.com/mcusim/freebsd-src/commit/efdc787dec2f408eac9d6d6bd8d6762aebb4a5ef ? Let me investigate it further and I'll explain details when I'll have a full picture, OK?

snail59 commented 8 months ago

@dsalychev : hello, I think I have a problem because I did not receive any notification and did not see your request. Sorry for the delay but not my fault here :-).

I tried your patch and did not notice any difference. I still have to unplug/replug the cable for the network to work and did not see anything different in logs output.

If I was supposed to check something else, please tell.

bzfbd commented 8 months ago

Just to ask, the problem only exists in "managed" mode?

snail59 commented 8 months ago

Just to ask, the problem only exists in "managed" mode?

I can neither confirm nor infirm because I don't know if copper and sfp+ work the same and if I can extrapolate. What I know is that my sfp+ ports are configured in fixed mode and don't show this problem.

bzfbd commented 8 months ago

I never noticed much as I netboot on dpni0 which just worked.

ifconfig dpni1 media none ifconfig dpni1 media autoselect

Does that make the port work for you without touching the cable (adjust interface to what you are using)?

snail59 commented 8 months ago

I never noticed much as I netboot on dpni0 which just worked.

ifconfig dpni1 media none ifconfig dpni1 media autoselect

Does that make the port work for you without touching the cable (adjust interface to what you are using)?

@bzfbd Just tried and yes it does !

bzfbd commented 8 months ago

Can you please try the patch from https://reviews.freebsd.org/D42643 (Download raw diff). It is relative to freebsd/main with minor other changes and seems to make the problem go away for the moment.

dch commented 8 months ago

D42643 works here, only a single test but I will do more this evening when family is asleep and won't complain about missing internet :1st_place_medal:

snail59 commented 8 months ago

Can you please try the patch from https://reviews.freebsd.org/D42643 (Download raw diff). It is relative to freebsd/main with minor other changes and seems to make the problem go away for the moment.

I can confirm it works. Plus, as a non regression test, I unplugged/replugged the cable and it still works.

pkubaj commented 8 months ago

Thanks, I couldn't test it because I don't want to reboot, but it's great to see it working. Since aarch64 is Tier 1 nowadays and this is a bugfix, can you request an MFS to releng/14.0?

Another question, I can see in this thread that SFP+ ports seem to work, but @dsalychev mentioned that they need a new driver, so which is it? I'm having some issues with getting a link to ix(4) card.

bzfbd commented 8 months ago

Committed as https://cgit.freebsd.org/src/commit/?id=964b3408fa872178aacf58f2d84dc43564ec0aa7

mcbridematt commented 8 months ago

Well done @bzfbd and @dsalychev!

Another question, I can see in this thread that SFP+ ports seem to work, but @dsalychev mentioned that they need a new driver, so which is it? I'm having some issues with getting a link to ix(4) card.

SFP ports can be driven either as 'fixed' links or as fully controlled "sff,sfp" devices. dsl has a driver for the latter here: https://reviews.freebsd.org/D41440, but it might require some more integration. Linux did not have fully working SFP support on DPAA2 until kernel 6.2!

Without sff,sfp support, on the Ten64, you will need to put the SFP ports into legacy (setenv sfpmode legacy in U-Boot) mode (and this is the firmware default on the Ten64). If you have a DAC (passive) SFP+, it should work without any further intervention. If it's a fiber optic SFP, you will need to activate the TXDISABLE pin (somehow). I think this can be done on FreeBSD with gpioctl? The Linux instructions are here: https://ten64doc.traverse.com.au/network/sfp/#manualadvanced-sfp-control-for-older-kernels-or-non-linux-managed-only You could try forcing the TXDISABLE GPIO in U-Boot or in the recovery firmware, as those GPIOs will persist across reboots.

dsalychev commented 8 months ago

https://reviews.freebsd.org/D41440, but it might require some more integration

It definitely does! Existing sff,sfp driver does nothing, but parses some values from FDT for future use.

@mcbridematt @pkubaj This is a gpioctl example on my Ten64 with SFP+ module plugged into the upper cage:

# gpioctl -f /dev/gpioc4 -l
pin 00: 1       SFP_LOW_TX_FAULT<IN>
pin 01: 1       SFP_LOW_TX_DISABLE<IN>
pin 02: 1       SFP_LOW_NOT_PRESENT<IN>
pin 03: 1       SFP_LOW_LOST_SIGNAL<IN>
pin 04: 0       SFP_HIGH_TX_FAULT<IN>
pin 05: 1       SFP_HIGH_TX_DISABLE<IN>
pin 06: 0       SFP_HIGH_NOT_PRESENT<IN>
pin 07: 0       SFP_HIGH_LOST_SIGNAL<IN>
pin 08: 0       gpio_P10<IN>
pin 09: 0       gpio_P11<IN>
pin 10: 0       gpio_P12<IN>
pin 11: 0       gpio_P13<IN>
pin 12: 0       ADMIN_LED<OUT>
pin 13: 0       gpio_P15<OUT>
pin 14: 1       gpio_P16<IN>
pin 15: 0       gpio_P17<IN>

I've renamed pins to better represent the states of the upper and lower cages. This is how DPMAC8 and 9 report their link types in case of setenv sfpmode legacy in U-Boot:

# dmesg | grep FIXED
dpaa2_mac8: max_rate=10000, eth_if=XFI, link_type=FIXED
dpaa2_mac9: max_rate=10000, eth_if=XFI, link_type=FIXED
dpaa2_ni8: connected DPMAC is in FIXED mode
dpaa2_ni9: connected DPMAC is in FIXED mode
bzfbd commented 8 months ago

Did you add media information as well (so ifconfig doesn't complain)? I just pasted my script for the GPIOs from earlier this year onto my Wiki page [1] in case it helps as I had the same problem of constantly looking up or changing the GPIOs (does FreeBSD have a framework but rc.local to automatically do this?); probably worth double-checking the settings.

[1] https://wiki.freebsd.org/BjoernZeeb/Ten64