RGMII Ethernet + MiSoC core does not work on Sayma

sbourdeauducq commented 6 years ago

So, I will try other things first.

jbqubit commented 6 years ago

Doesn't HK have flea markets selling things like JTAG cables? Please go get a cable.

whitequark commented 6 years ago

These things aren't sold on flea markets, or at least you or I wouldn't know where to find or how to use them.

gkasprow commented 6 years ago

you can buy it on aliexpress, ebay, just 10$ per piece

On 14 December 2017 at 17:12, whitequark notifications@github.com wrote:

These things aren't sold on flea markets, or at least you or I wouldn't know where to find or how to use them.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/m-labs/artiq/issues/854#issuecomment-351757052, or mute the thread https://github.com/notifications/unsubscribe-auth/AEH-vnLnl1-fPPcS5_mo8NbSCtVfcR9Sks5tAUjwgaJpZM4QoNnT .

gkasprow commented 6 years ago

sorry, 25$ with shipping :)

On 14 December 2017 at 17:57, Grzegorz Kasprowicz < G.Kasprowicz@elka.pw.edu.pl> wrote:

you can buy it on aliexpress, ebay, just 10$ per piece

On 14 December 2017 at 17:12, whitequark notifications@github.com wrote:

These things aren't sold on flea markets, or at least you or I wouldn't know where to find or how to use them.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/m-labs/artiq/issues/854#issuecomment-351757052, or mute the thread https://github.com/notifications/unsubscribe-auth/AEH-vnLnl1-fPPcS5_mo8NbSCtVfcR9Sks5tAUjwgaJpZM4QoNnT .

jordens commented 6 years ago

Just to summarize the options, there are a couple of more or less viable paths to get experiments loaded onto Sayma dynamically:

Continue debugging RGMII mode, would potentially benefit from a chipscope trace of Greg's bitstream
Try MII mode, use MII PHY, needs reflashing of MMC firmware, xmodem or some windows tool
Use Sayma SFP, port 1000-BaseX PHY, write new transceiver layer, debug
Implement loading experiments over serial port (SLIP, PPP, custom)

sbourdeauducq commented 6 years ago

There are also RGMII FMC cards. The chipscope trace with Greg's bitstream is only marginally useful, it merely checks that the hardware is working. Better look at the signals at different points in our PHY.

gkasprow commented 6 years ago

@sbourdeauducq cannot you attach a chipscope to your design? You have the verilog as the design entry for synthesizer, so should be relatively easy to add chipscope to look at the signals in your core.

sbourdeauducq commented 6 years ago

If so, can you share a minimal Vivado project?

Your project is 416MB and contains loads of random files. Where should I look?

sbourdeauducq commented 6 years ago

This mess even contains at least 2 copies of the RGMII pin definitions, which is something I wanted to double check. Which one did you use?

gkasprow commented 6 years ago

Do you want to modify it or just load the bit file and run chipscope? In case of second case just use bit files in \Sayma_ETH\kc705_delay_ipbus.runs\impl_2\sayma_amc_tester.bit U also need Chipscope debug probe file: Sayma_ETH/kc705_delay_ipbus.runs/impl_2/debug_nets.ltx

Sources are here: \Sayma_ETH\kc705_delay_ipbus.srcs\sources_1\imports\kc705_delay_ipbus\src top entity is delay_tester_ipbus.vhd It instantiates: j1env - FORTH processor that I use to control I2C, IO, SPI, etc 2 chipscopes clocks ipbus MAC + protocol layer that talks over Ethernet and broadcasts messages once a few seconds slaves - IO registers controlled by ipbus. Originally they were used to control iodelay in other project. That's the name of the project comes from. I simply moved KC705 project to Sayma because it had all I needed to debug Sayma. The RGMII converter and chipscope is in top entity file line 491 - uart mux that lets me insert uart loopback line 540, 548 - chipscope instatiation line 615 GMII to RGMII converter - TX path line 666 RGMII to GMII converter - Rx path line 717 chipscope probes connection

There are also 2 DDR controllers but are commented to shorten compilation time.

sbourdeauducq commented 6 years ago

Again, Chipscope is of limited usefulness and like all Xilinx software is a pain to set up, so I'm not going to use it for now. I want to compare your design with mine.

gkasprow commented 6 years ago

It is painful, that's why I instantiate it in my code. That simply works and I never had problems with it. If you attach it to compiled design, then real pain starts ;)

sbourdeauducq commented 6 years ago

@gkasprow Is it acceptable to use the RX clock as the TX clock? Back in August your reference code did that, but you changed that in this version. Why?

gkasprow commented 6 years ago

It should work, I changed it because had problems with ipbus core and tried many things to fix it. I found the problem which was not caused by the clock connection.

sbourdeauducq commented 6 years ago

Hmm, it's not clear to me what is the reference clock in that case. It's like putting two transceivers with loop timing back-to-back. Or does the PHY chip include appropriate clock correction?

sbourdeauducq commented 6 years ago

Or does the PHY chip include appropriate clock correction?

Judging from the very relaxed GTXCLK Period specification in the PHY datasheet (7.2 to 8.8ns), it would seem so. The chip contains a very large buffer...

gkasprow commented 6 years ago

As I said, it is connected due to historical reasons :)

sbourdeauducq commented 6 years ago

@gkasprow I don't understand this part of your code (why there is an additional pipeline register on gmii_ctl_falling):

 if  gmii_gtx_clk'event and  gmii_gtx_clk = '1' then  -- rising clock edge
      gmii_ctl_rising1 <= gmii_tx_en;
      gmii_ctl_rising2 <= gmii_tx_en xor gmii_tx_er ;
      gmii_ctl_falling <= gmii_ctl_rising2;

Shouldn't that be simply:

 if  gmii_gtx_clk'event and  gmii_gtx_clk = '1' then  -- rising clock edge
      gmii_ctl_rising1 <= gmii_tx_en;
      gmii_ctl_falling <= gmii_tx_en xor gmii_tx_er ;

sbourdeauducq commented 6 years ago

The standard says:

During normal frame transmission, the [TX_CTL] signal stays at a logic 
high for both edges of TXC and during the period between frames
where no errors are to be indicated, the signal stays low for both edges.

gkasprow commented 6 years ago

@sbourdeauducq it seems you are right.

sbourdeauducq commented 6 years ago

Why did you clock all the IDDRE1 with the inverted clock gmii_rx_clk_b? The registers that follow it are clocked by the non-inverted clock gmii_rx_clk. This is an unusual thing to do and it results in unnecessarily short timing paths inside the FPGA. Does it work when clocking IDDRE1 with the non-inverted clock and swapping q1 and q2?

Also, those primitives are called IDDRE1 in Ultrascale. Why are the instance names IDDRE2_inst?

sbourdeauducq commented 6 years ago

Now ARTIQ receives frames, but the data is corrupted (no SFD is found)...

sbourdeauducq commented 6 years ago

Why did you clock all the IDDRE1 with the inverted clock gmii_rx_clk_b? Does it work when clocking IDDRE1 with the non-inverted clock and swapping q1 and q2?

@gkasprow did you try that?

gkasprow commented 6 years ago

I wanted to swap nibbles. I didn't try that since existing solution works. If it really helps I can check it but I'd have to assemble the setup.

sbourdeauducq commented 6 years ago

I tried that on my design, and the data corruption is different depending whether I use the inverted clock with swapped nibbles, or the non-inverted clock with the original ordering.

It seems that a lot of the problems we are seeing are due to poor timing at the I/O interface, which is compounded by the RX clock not going to a clock capable pin. The latter makes timing non-deterministic between Vivado runs depending on the rest of the design, and I suspect Vivado might not miss that golden opportunity to exhibit further bugs (ISE+Spartan6 does this very well, e.g. PLL may become unstable around the design if you use too many of those non-clock-capable paths).

To work around the problem, I've been trying to route the clock through the Si5324 but the latter will not lock correctly with a simple differential buffer (OBUFDS) with input on the RX clock pin and output to the Si5324. Replacing the input to the OBUFDS with the 125MHz system clock results in a stable lock, so it is not a problem with the Si5324. Either the RX clock sent by the Ethernet PHY chip is flaky (seems unlikely) or the Xilinx crap is not working correctly and the route (through regular FPGA interconnect) between rgmii_rx_clk and the differential output buffer is dysfunctional or overly noisy.

I think switching to MII, which has a tenth of the data rate, would provide better timing margins and may enable functionality despite poor clock routing and timing.

sbourdeauducq commented 6 years ago

@gkasprow Did you have to use those constraints to get the design working?

set_input_delay -clock [get_clocks gmii_rx_clk] -min 0.100 [get_ports {gmii_rxd[*]}]
set_input_delay -clock [get_clocks gmii_rx_clk] -max 4.000 [get_ports {gmii_rxd[*]}]
set_input_delay -clock [get_clocks gmii_rx_clk] -min 0.100 [get_ports gmii_rx_dv]
set_input_delay -clock [get_clocks gmii_rx_clk] -max 4.000 [get_ports gmii_rx_dv]
set_output_delay -clock [get_clocks -of_objects [get_nets *clk125]] -min 0.100 [get_ports {gmii_txd[*]}]
set_output_delay -clock [get_clocks -of_objects [get_nets *clk125]] -max 3.500 [get_ports {gmii_txd[*]}]
set_output_delay -clock [get_clocks -of_objects [get_nets *clk125]] -min 0.100 [get_ports gmii_tx_en]
set_output_delay -clock [get_clocks -of_objects [get_nets *clk125]] -max 3.500 [get_ports gmii_tx_en]

gkasprow commented 6 years ago

Initially I used these constraints but the compilation took >10h. So I relaxed them, compilation takes much less but this may cause issues with timings

gkasprow commented 6 years ago

@sbourdeauducq I can route the clock with piece of wire to the clock pin in another bank.

sbourdeauducq commented 6 years ago

How hard is that rework?

gkasprow commented 6 years ago

It would be not that hard. Just cut one trace on the PHY side and route the signal to the GPIO0 which is global clock on FPGA pin AG11. You can try to do it without cutting the trace but the trace is quite long and will cause ringing. This method will add something like 5cm to the clock line but will skip the mux so may compensate to some extend the Rx CLK trace is highlighted obraz

gkasprow commented 6 years ago

DIO0 is highlighted here. This is bottom view. obraz

sbourdeauducq commented 6 years ago

OK, thanks I will try both options - rework and/or MII. Please remember to send me MII MMC firmware.

sbourdeauducq commented 6 years ago

@gkasprow in MII mode, the pins CRS and RX_ER are not accessible to the FPGA, is that correct?

sbourdeauducq commented 6 years ago

@gkasprow I would like to to the rework on sayma3, on which I have already flashed the MMC and which is going to be the guinea pig for risky experiments. But, currently the MMC image I have flashed (lpc_1776_ethernet.axf you sent me a while ago) breaks FPGA JTAG. Please send me updated MMC images.

sbourdeauducq commented 6 years ago

NB: there is no RTM on Sayma3, and the FPGA JTAG bug is likely what we have seen before - and you had proposed a fix.

sbourdeauducq commented 6 years ago

In MII mode, the TX clock is also not routed to a clock-capable pin...

gkasprow commented 6 years ago

TX clock is FPGA output so can be driven from ordinary IO

sbourdeauducq commented 6 years ago

Not with MII.

Besides, MII doesn't work at all. The DV pin is never asserted. Can you double-check the PHY configuration?

sbourdeauducq commented 6 years ago

Additionally: no link detected by the media converter. So this is not a FPGA problem.

gkasprow commented 6 years ago

are you sure you have 1Gbit link on the other side? Do you generate TXCLK= 25MHz ?

sbourdeauducq commented 6 years ago

Do you generate TXCLK= 25MHz ?

Again: with MII that clock is supposed to be generated by the PHY.

sbourdeauducq commented 6 years ago

are you sure you have 1Gbit link on the other side?

Yes. That media converter detected a link with RGMII and RXCTL was asserted (but with data corruption). I did not touch it.

gkasprow commented 6 years ago

TXCLK direction can be in or out depending on the PHY setting (DTE vs DCE). I did not change the direction setting so it expects the clock there.

sbourdeauducq commented 6 years ago

Is that standards-compliant?

sbourdeauducq commented 6 years ago

According to this document http://ieeexplore.ieee.org/document/485301/ it is not. "TX-CLK is a continuous clock that provides the tim- ing reference for TXD<3:0>, TX-EN, and TX-ER. The TX-CLK frequency is 25% of the nominal data transmission rate, or 25 MHz for 100 Mbps operation and 2.5 MHz for 10 Mbps operation. TX_CLK is driven by the PHY."

gkasprow commented 6 years ago

OK, it depends if it is DCE or DTE device. it can drive another PHY, that's why the clock dir can be programmed.

gkasprow commented 6 years ago

You need both Rx clk and Tx clk outputs ?

sbourdeauducq commented 6 years ago

Yes, that's what standard MII is... it's not very difficult to patch the FPGA code so that it sends TX_CLK, and that would avoid another clock input on a non-clock pin, but it's better if we can follow the standard.

gkasprow commented 6 years ago

OK, I will then switch this register in MMC code. This can be done by shorting the config pin but in the MMC code is easier :)

m-labs / artiq

RGMII Ethernet + MiSoC core does not work on Sayma #854