m-labs / artiq

A leading-edge control system for quantum information experiments
https://m-labs.hk/artiq
GNU Lesser General Public License v3.0
429 stars 200 forks source link

RGMII Ethernet + MiSoC core does not work on Sayma #854

Closed sbourdeauducq closed 6 years ago

jbqubit commented 6 years ago

I've paused to myself about how this Sayma subsystem is setup. @gkasprow please correct me if any of this is wrong.

There was discussion of white wire modifications in a Sinara Issue. https://github.com/m-labs/sinara/issues/327

gkasprow commented 6 years ago

I've paused to myself about how this Sayma subsystem is setup. @gkasprow

https://github.com/gkasprow please correct me if any of this is wrong. There is a single outward-facing Ethernet interface on Sayma_AMC.

  • 1000BaseX over copper traces on microTCA Port 0 of the AMC backplane.

exactly

-

  • Ways of connecting to Port 0 are
    • a card edge adapter from WUT that provides power and RJ45 breakout

it provides SATA connector

  • microTCA Carrier Hub Ethernet switch on NAT MCHBasic v3.5 http://www.nateurope.com/manuals/nat_mch_base_v3x_man_hw.pdf
    • Broadcom BCM5396 Gigabit Ethernet Switch
    • 1000BaseX interfaces for fabric A consisting of 12 Port0's for up to 12 AMC modules
    • RJ45 1000BaseT connector on the front panel labeled GbE1

there is yet another way - there is SATA connector close to the edge connector. It is connected in parallel to the link. It is not very elegant, but for 1Gbit it's not an issue and let you test the board outside the crate.

MAX24287 https://www.microsemi.com/product-directory/ieee-1588-plls-and-software/4668-max24287 interfaces between outward-facing 1000BaseX and inward-facing MII or RGMII.

  • for 1000BaseX: RDP, RDN, TDP, TDN are connected to AMC Port0.
    • inward-facing interface can be exposed to either MMC (LPC1776 ARM uP) or the FPGA (Ultrascale).
    • A digital mux (SN74CB3Q32245ZKE) selects which of the two chips is connected to the MAX24287 under the control of trace SEL_RGMII driven by the MMC.
    • The PLC1776 uses a parallel MII type interface.
    • The FPGA uses a parallel RGMII-1000 type interface.

Exactly. By default the PHY is routed to the FPGA If you have troubles, I'll have a look once again, but it was working.

jbqubit commented 6 years ago

@gkasprow Please edit your previous response to add additional line breaks between quote carrots '<'.

a card edge adapter from WUT that provides power and RJ45 breakout

it provides SATA connector

OK. The MAX24287 1000BaseX serial interface is connected to both AMC Port0 and SATA connector J10. But 1000BaseX and SATA are not easy to use; most Ethernet hubs are 1000BaseT with RJ45 plug. So to test Ethernet on bench top a different adapter is needed.

Exactly. By default the PHY is routed to the FPGA

When you say "by default" do you mean a combination of a) defined by components on PCB and b) defined by MMC after it's booted. Do you have a list of power-on steps taken by the MMC and what the state of various lines is? Or is there an annotated source file that I should read?

@gkasprow some Q&A that might help with trouble M-Labs is having.

Reset-based Reconfiguration

Toggling MAX24287 from MMC to FPGA is pretty complicated as it involves resetting the chip. Table 6.1 of the datasheet describes the value of the 15 configuration pins. These pins have different roles after the reset phase is finished, so many are driven by multiple sources.

I see some odd looking things comparing Table 6.1 and sheet 6 of the Sayma_AMC layout for the MAX24287. Here's a comparison for the two modes. Odd things are noted.

select MII for use with MMC

select RGMII for use with FPGA

jbqubit commented 6 years ago

@sbourdeauducq What do you mean by "RGMII Ethernet + MiSoC core does not work on Sayma"? What have you tried?

gkasprow commented 6 years ago

Guys, I will work on it very soon, hopefully tomorrow.

sbourdeauducq commented 6 years ago

No packet can be transmitted or received. When the PHY is clocked, and my cable is not broken (the SATA hack is very fragile), then autonegociation succeeds.

sbourdeauducq commented 6 years ago

@gkasprow Any findings? Now that the clocking and DACs are mostly working, Ethernet seems to be the major blocker to get RF output using ARTIQ.

gkasprow commented 6 years ago

@sbourdeauducq Today I built setup to test Ethernet and there is partial success - it does not work at all. I'm quite happy with it because in this case I can find and solve the problem. Moreover, I used same board that was used to test Ethernet. So it seems something changed since that time and probably the same issue emerged on other boards.

jbqubit commented 6 years ago

@gkasprow Glad there's now something tangible that looks wrong on your side too. Progress comes in many colors. :)

jbqubit commented 6 years ago

Debugging this is top priority. The Sayma hardware and lots of M-Labs gate ware is ready to test. Getting Ethernet up and running is the bottleneck to forward progress right now.

gkasprow commented 6 years ago

I think I know where the problem is. I implemented simple condition in MMC firmware that resets the PHY chip after FPGA gets configured and DONE goes low. But in fact the PHY is hold in reset state when DONE is LOW, which is wrong. Corrected piece of code is here

```
                //check if FPGA is programmed
                    //DONE line is high - FPGA not ready
                    if (LPC_GPIO0->PIN &(1 << 5))
                        {
                        //RESET PHY
                        LPC_GPIO0->DIR |= (1 << 23);
                        LPC_GPIO0->CLR |= (1 << 23);
                        // ETH LED OFF
                        LPC_GPIO0->DIR |= (1 << 31);
                        LPC_GPIO0->CLR |= (1 << 31);
                        }
                    else
                    {
                        //un-RESET PHY
                        LPC_GPIO0->DIR |= (1 << 23);
                        LPC_GPIO0->SET |= (1 << 23);
                        // ETH LED ON
                        LPC_GPIO0->DIR |= (1 << 31);
                        LPC_GPIO0->SET |= (1 << 31);
                    }
gkasprow commented 6 years ago

Here is the binary file:

lpc1776_ethernet_I2C.zip

gkasprow commented 6 years ago

I will test it on Monday.

sbourdeauducq commented 6 years ago

Why do I still get autonegotiation to work, then? Is that PHY chip still doing autonegotiation while in reset?

gkasprow commented 6 years ago

In my media converter it shows LINK state when I plug SFP, even with AMC power supply off. Is there any form of autonegotiation in 1Gbit Ethernet over SFP? There is only link state when valid symbols are decoded. The reset line also disables the PHY clock generator so its impossible to have any activity. The PHY also needs Tx clock from the FPGA to send something, that's why I release the reset after the FPGA gets configured.

sbourdeauducq commented 6 years ago

In my media converter it shows LINK state when I plug SFP

Yes, that was a problem with one of my media converters too. Some of those just use (and require) the EEPROM and/or the LOS signal, which was one of the problems with the cable you gave me, since you had removed its chips entirely. Some other media converters show the status of the autonegotiation instead. https://ssl.serverraum.org/lists-archive/artiq/2017-November/001165.html

Is there any form of autonegotiation in 1Gbit Ethernet over SFP?

Yes, see section 36.2.5.2.7 "Auto-negotiation process" of IEEE 802.3-2008. The autonegotiation is optional and can be disabled with a switch on some media converters.

There is only link state when valid symbols are decoded.

No, there is more (optionally).

The reset line also disables the PHY clock generator so its impossible to have any activity.

In this case this is not the problem on my boards, since another of my media converters is sensitive to whether the SATA side of the cable is plugged or not.

gkasprow commented 6 years ago

@sbourdeauducq I don't have access to AMC board right now, I simply found this issue looking at the code. University is closed right now. I will test it on Monday. Is the MII LED on?

sbourdeauducq commented 6 years ago

What is the MII LED?

gkasprow commented 6 years ago

It is a LED on the front panel which is connected to the CPU Its original role was to signal who is talking to the PHY chip, at the moment it shows if PHY is in reset state or not. So when the led is lit, the PHY opearates normally.

sbourdeauducq commented 6 years ago

@gkasprow Have you been able to get RGMII Ethernet to work again with your demonstration code? If so, can you share a minimal Vivado project?

gkasprow commented 6 years ago

I just noticed that I was wrong, DONE pin when high indicates correct FPGA configuration. So the MMC code you have is correct.

gkasprow commented 6 years ago

So far no success. I tested 3-pin mode and 15-pin configuration mode. I observe PHY transferring data and Rx data on PHY pin. But there is no activity on DV line at all. I will access MDIO registers to see what's really going on.

gkasprow commented 6 years ago

@sbourdeauducq @jbqubit I found! The PHY works in SGMII mode instead of 1000BASE-X My media converter that I used before works in both modes and it detects it automatically. Now I use media converter that works only in 1000BASE-X mode.

gkasprow commented 6 years ago

Funny thing, I wrote little piece of code that dumps PHY registers Register, ADDR, DATA BMCR 0 0x1000 BMSR 1 0x7969 ID1 2 0x0 ID2 3 0x0 AN_ADV 4 0x20 AN_RX 5 0x41a0 AN_EXP 6 0x2 EXT_STAT 15 0x8000

And the value 0x20 in the AN_EXP register means that we operate in 1000base-X mode! So the PHY setting is OK obraz So it seems my media converter is simply broken.

gkasprow commented 6 years ago

With another media converter I get reasonable data on Rx lines and observe them with chipscope

gkasprow commented 6 years ago

There could be yet another issue which is dependent on particular chip. The datasheet says: obraz I will add it to the MMC and see what happens.

gkasprow commented 6 years ago

We have revision B of the chip

gkasprow commented 6 years ago

Still don't know why it works with one media converter and doesn't with another For working media converter the register content is below: Register, ADDR DATA BMCR 0 0x1000 BMSR 1 0x796d ID1 2 0x0 ID2 3 0x0 AN_ADV 4 0x20 AN_RX 5 0x41a0 AN_EXP 6 0x0 EXT_STAT 15 0x8000 page0 JIT_DIAG 16 0x0 PCSCR 17 0x11 GMIICR 18 0x8c80 CR 19 0x0 IR 20 0x10 page1 ID 16 0x1ee0 GPIOCR1 17 0x6c00 GPIOCR2 18 0x0 GPIOSR 19 0x80c PTPCR1 20 0x0 page2 PTPCR1 16 0x4000

All seems to be configured right. I get Rx data obraz

with not-working converter I have such settings: Register, ADDR DATA BMCR 0 0x1000 BMSR 1 0x7949 ID1 2 0x0 ID2 3 0x0 AN_ADV 4 0x20 AN_RX 5 0x41a0 AN_EXP 6 0x0 EXT_STAT 15 0x8000 page0 JIT_DIAG 16 0x0 PCSCR 17 0x11 GMIICR 18 0x8c80 CR 19 0x0 IR 20 0x0 page1 ID 16 0x1ee0 GPIOCR1 17 0x6c00 GPIOCR2 18 0x0 GPIOSR 19 0x80c PTPCR1 20 0x0 page2 PTPCR1 16 0x4000

gkasprow commented 6 years ago

The only difference is in BMSR register 0x796d versus 0x7949 which simply means link down

gkasprow commented 6 years ago

And the front panel LED indicating LINK UP is on.

gkasprow commented 6 years ago

The funny thing is that a few months ago this TPLINK media converter was working... I have 2 pieces, they are brand new.

gkasprow commented 6 years ago

@sbourdeauducq are you sure you produce valid TXCLK o 125MHz? You said that Link is established, so does the LINK LED on front panel is ON? On my board I observe packets in and out. I modified the MMC code so it additionally initializes the PHY chip (errata fix) and prints chip configuration once the front panel button is pressed. edit: But the initial configuration is the same as in your board. And the TPLINK media converter used to work before and now does not which I'm trying to understand. edit: I used same Vivado design as I published on a dropbox.

sbourdeauducq commented 6 years ago

Which led is the link led?

gkasprow commented 6 years ago

It's LD6, third counting from the USB connector.

jbqubit commented 6 years ago

Any update on this today?

gkasprow commented 6 years ago

It simply works in my case. Im preparing a design for mlabs to precisely diagnose ethernet using chipscope and vivado. It is just compiling...takes so much time because i use ordinary io as rgmii clock input:)

sbourdeauducq commented 6 years ago

What clock buffer are you using?

gkasprow commented 6 years ago

I use BUFG.

gkasprow commented 6 years ago

OK, I have a design that implements some Ethernet core that sends requests once a few seconds. I see them using WireShark. There is a chipscope instance connected to GMII output, before GMII - RGMII converter. On the Rx side I use RGMII-GMII logic and chipscope connected to it. I can trigger with TX_EN and RX_DV respectively. I see same frame structure on the chipscope and wireshark. The code is messy, it was a quick setup based on another design to test the Ethernet functionality. The bit file is here: Sayma_ETH/kc705_delay_ipbus.runs/impl_2/sayma_amc_tester.bit To run the Chipscope you also need the probe definition file: Sayma_ETH/kc705_delay_ipbus.runs/impl_2/debug_nets.ltx The Chipscope signal assignment is below.

    probe1 (7 downto 0) <= gmii_rxd (7 downto 0);
    probe1 ( 8) <= gmii_rx_dv;
    probe1 ( 9) <=  gmii_rx_er;  
    probe1 (15 downto 10) <= (others => '0');
    probe0 (7 downto 0)  <= gmii_txd (7 downto 0); 
    probe0 ( 8) <=   gmii_tx_en;
    probe0 ( 9) <=   gmii_tx_er;
    probe0 (15 downto 10) <= (others => '0');
sbourdeauducq commented 6 years ago

I use BUFG.

Then I wonder why Vivado takes more time to compile just because the clock input is on a non-clock-capable pin. Is it trying to move that BUFG around the FPGA to try to meet some timing constraints between clock and data IO pins? If that's the case, maybe adding a location constraint on the BUFG would improve this.

Or lock the route between the clock input pin and the BUFG - this was awful to do with ISE's bug-infested tools, I don't know the procedure for Vivado, which may or may not have been improved.

sbourdeauducq commented 6 years ago

Sayma_ETH/kc705_delay_ipbus.runs/impl_2/sayma_amc_tester.bit

Where is that? Can you put all those files in some public place, e.g. create a new repository on GitHub, host them on some web server, or zip + attach to issues?

gkasprow commented 6 years ago

@sbourdeauducq It is in my dropbox account I shared some time ago. The files are also here:

impl_2.zip

jbqubit commented 6 years ago

@sbourdeauducq Can you reproduce what @gkasprow sees with WireShark and chipscope?

jbqubit commented 6 years ago

@sbourdeauducq ping

gkasprow commented 6 years ago

It should look like this: obraz

sbourdeauducq commented 6 years ago

Does the built-in JTAG work with vivado?

gkasprow commented 6 years ago

I didnt even try. I use external jtag cable

gkasprow commented 6 years ago

@sbourdeauducq I found little issue with the design and bits I published. The nibbles on the Rx side were swapped and could cause confusion. I fixed it and now it's fine. The files are in the Dropbox directory I shared. This is how it should look like. The MSB is rx_dv line and i use it as a trigger obraz

jbqubit commented 6 years ago

@sbourdeauducq Any luck with this using external JTAG?

sbourdeauducq commented 6 years ago

I don't have a cable and I'd rather not install the crappy Xilinx drivers. They never work and typically waste a few hours on yak-shaving every time.