rustbox / router

0 stars 0 forks source link

Far End Test Loop and MDIO #1

Open dougli1sqrd opened 1 month ago

dougli1sqrd commented 1 month ago

I'm attempting to follow the instructions written by @sethp in TEST.md to perform the Far End Test loop.

So I have l2perf locally and built, and ran the l2perf commands under the Far End Test loop heading as well as ran the MDIO code under the jtag directory to set the 0x13 register to 0b010_0_0000_0000_0001 putting the PHY in the "far end test loop" (FETL) mode.

The l2perf Rx test did not show anything appearing, although the Orange LED blinks when the Tx test occurs, perhaps indicating that the PHY is in fact at least receiving the data.

Reading the GPY111 datasheet slightly more carefully I saw that the PHY will put itself in FETL mode after it re-links with another device. Taking that to mean that I should try unplugging the ethernet cable and pluugging it back in I did that, but again to no avail.

Then I wanted to verify the contents of the register at 0x13, and reading the value I see 0x8003: or "RJTL RL45 connector test loop". Reading the contents during the reset state yields 0xFFFF (expected), and after reset is allowed to get pulled high and the GPY111 chip runs again, before writing the value of the register is 0x0003 (Or no testing mode enabled).

This makes think that the GPY is genuinely not in FETL mode when we're doing the above l2perf tests. And that means:

  1. our MDIO implementation is off in a subtle way
  2. some other GPY111 setting is perhaps interjecting that particular value, disrupting our ability to set the value for some reason?

More likely something fishy is up with the MDIO implementation and we should investigate that.

sethp commented 1 month ago

Thanks for the write up! I had noticed the same sentence in the datasheet and hoped that simply re-linking the device would "just work," but alas.

after reset is allowed to get pulled high and the GPY111 chip runs again, before writing the value of the register is 0x0003 (Or no testing mode enabled).

Oh, really? The datasheet lists the reset value of that register as 0x0001, not 0x0003—could that be an off-by-one in the MDIO implementation?

Then I wanted to verify the contents of the register at 0x13, and reading the value I see 0x8003: or "RJTL RL45 connector test loop".

"Hmm" intensifies: definitely seems sus that we'd write 0x5001 and get back 0x8003. Also "RL45 connector test loop" is something else, we want "FETLS Standalone Far-end test loop. No dependency on TX_CLK and RX_CLK on the (G)MII interface."

sethp commented 1 month ago

Ok, some ideas:

  1. We could try speeding things up, and or/adding a longer preamble: 100kHz is really slow compared to the nominal MDIO frequency of ~25MHz. If we're failing to lock some internal phase adjustment jobbie, either of those might help? [^1]
  2. Checking and/or tightening up the timing might help too? Right now we've got a lot of "do a thing, wait for 1/2 period ns" but that doesn't account for the time to do the thing. Better would be "sample clock & add half period ns, do a thing, wait for marked point to pass"
  3. It looks like you're sampling while the clock is high; have you tried sampling while the clock is low (i.e. "rising edge triggered" with 1/2 period of setup time and just a few negative ns of hold time)

[^1]: yes, I know 100kHz is in the datasheet, but it reads to me like they're talking about frequency domain switching and getting undefined behavior if we have transactions closer together in time than one every 10µs, not about clock speed.

sethp commented 1 month ago

I think the current state of this issue is that:

  1. We have not yet successfully completed a Far-End Loopback Test, and
  2. We have both run a pretty loose collection of ad-hoc experiments to validate our MDIO implementation, and have both gotten "mixed" results. Notably, we still can't reliably "read our writes," that is write to a register and get the bits we expect back.

But what we have done is:

  1. Written to the LED bits in one of the control registers and seen the lights respond,
  2. Successfully received packets sent by the "TPG" on the device (albeit not the ones the datasheet claims it would send, but hey, what's a test mode framing error and outdated OUI for a defunct manufacturer between friends?)

So as a result, we're fairly confident in our "write" implementation, and that the error for (2) lies in the "read" side and that (1) is due to an undocumented constraint or pre-requisite on the "far end test loop." Does that all sound right to you?