mwrnd / innova2_flex_xcku15p_notes

Nvidia/Mellanox Innova-2 Flex Open Programmable SmartNIC Setup and Usage Notes for XCKU15P FPGA Development
BSD 2-Clause "Simplified" License
45 stars 6 forks source link

Add Bitstreams and Files for the 4GB Version of the MLNX Innova 2 Flex #3

Open CuteSC2 opened 2 years ago

CuteSC2 commented 2 years ago

There exist a 4GB version of the Innova 2 Flex with MT40A512M16 DDR4 ICs. So there should be Bitstreams, Documentation and Project files provided.

mwrnd commented 2 years ago

Ideally, Nvidia/Mellanox would provide better support for their product.

My board is labeled MNV303212A-ADLT Rev:A3 and seems to work with the Constraints File released by Nvidia.

I am aware of the 4GB Models with MT40A512M16 DDR4 ICs (D9TBK FBGA Code) as I spotted the MNV303212A-ADIT for sale on eBay. Note the AD I T versus AD L T. It does not currently show up in searches of the Nvidia Networking Forum.

I can add a variant of the project's TCL file and bitstreams for the 4GB model but do not feel comfortable including untested designs in the repository.

If you or anyone who comes across this issue has one of these 4GB models and is willing to help with testing, please join this comment thread.

It would be useful to have information on your setup:

mwrnd commented 2 years ago

I have added a test project for the MNV303212A-ADIT. I recompiled the DDR4 8Bit Byte-Lane-0 test project for the MT40A512M16LY-075 which I believe to be closest to the MT40A512M16JY-083E:B with D9TBK FBGA Code.

Applepi commented 1 year ago

It looks like I have 3 of the 4GB non-crypto version of the card coming in the mail from ebay, when I get home I'll setup an environment to test these. I have a plan to use these with an external TB3 PCI-E riser on MacOS since Mellanox drivers were added in Ventura but I'll setup using the base environment first to kick the tires.

mwrnd commented 1 year ago

I got one of the 4GB MNV303212A-ADIT boards, Rev:A2, which is the same PCB revision as my 8GB board. The 1-byte wide DDR4 test project works with MT40A512M16JY-075 as the DDR4 component! I will create a design to test the full 4GB.

However, there are other issues with the 4GB ADIT boards compared to the 8GB ADLT. The PSID is MT_0000000142 for the 4GB ADIT. The 8GB ADLT has PSID: MT_0000000158. I tested every version of mlxup from 4.10.0 to 4.24.0 and none have any firmware update for the 4GB ADIT. I saved and then erased the ConnectX-5 Firmware Flash IC and programmed it with the 8GB ADLT MORSE_FW firmware. The FPGA+DDR4 and the PCIe switch within the ConnectX-5 seem to be working after some testing. I am unable to get the 25GbE interfaces working.

Firmware 16.22.1002 was the original firmware on my board. According to Nvidia/Mellanox's EoL Notification, this is the final firmware released.

MNV303212A-ADIT_EOL_LCR-000437

Only one of the 25GbE interfaces shows up in sudo lspci -vnn while running the original firmware. Both show up when it is programmed with the 8GB ADLT firmware.

03:00.0 Class [2000]: Mellanox Technologies Innova-2 Flex Burn image [15b3:0264]
    Subsystem: Mellanox Technologies Innova-2 Flex Burn image [15b3:0264]
    Flags: fast devsel
    Memory at 4016000000 (64-bit, prefetchable) [size=2M]
    Capabilities: [40] Power Management version 3
    Capabilities: [48] MSI: Enable- Count=1/1 Maskable- 64bit+
    Capabilities: [70] Express Endpoint, MSI 00
    Capabilities: [100] Advanced Error Reporting
    Capabilities: [180] Alternative Routing-ID Interpretation (ARI)
    Capabilities: [1c0] Secondary PCI Express
    Kernel driver in use: mlx_fpga_bope

04:00.0 Ethernet controller [0200]: Mellanox Technologies MT27800 Family [ConnectX-5] [15b3:1017]
    Subsystem: Mellanox Technologies MT27800 Family [ConnectX-5] [15b3:0043]
    Flags: fast devsel
    Memory at 4010000000 (64-bit, prefetchable) [size=32M]
    Capabilities: [60] Express Endpoint, MSI 00
    Capabilities: [48] Vital Product Data
    Capabilities: [9c] MSI-X: Enable- Count=64 Masked-
    Capabilities: [c0] Vendor Specific Information: Len=18 <?>
    Capabilities: [40] Power Management version 3
    Capabilities: [100] Advanced Error Reporting
    Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
    Capabilities: [180] Single Root I/O Virtualization (SR-IOV)
    Capabilities: [1c0] Secondary PCI Express
    Kernel modules: mlx5_core
Applepi commented 1 year ago

I can confirm as well that when I flash the 8GB ADLT firmware both interfaces show up but I'm unable to get the card to detect optics on either firmware. And on my 12900K machine when it trys to boot Linux it crashes and reboots.

Applepi commented 1 year ago

I think these cards may be using a different chip entirely from the other ADLT models, there was a document that I found referring to this card as a Innova-2 Flex P^2 programmable pipeline. It was also saying that it was an EN card so maybe this card is using a different chip? But at the same time the document states it should support OFED. But I've tried MLNX EN and OFED and inbox drivers and I've not been able to get a link up on any OS with these cards. I only have a few more days before my return window on these cards close but I may just return them if I cannot use the NIC side of these cards they are kinda useless to me. I have REV3 Engineering sample cards that show up as MT_0000000141 MNV303212A-ADAT_C18_Ax Innova-2 Programmable EN Adapter; dual SFP port; 25GbE; PCIe3/4 x8; HHHL; active heat sink; tall bracket; ROHS R6 there is no firmware available through MLXUP. The firmware they shipped with was 16.22.0264

I also tried flashing the lenovo firmware because they had referred to the cards as Innova-2 Flex EN in some cases but it just put a signed firmware on the card with the same issues lol.

https://indico.psi.ch/event/6393/sessions/3484/attachments/11629/14919/Mellanox_PSI.pdf

https://manualzz.com/doc/33751280/innova%E2%84%A2-2-programmable-adapter-card

mwrnd commented 1 year ago

The main ICs on my 4GB ADIT board are the same as on the 8GB ADLT version and appear to use the same PCB layout. Main difference I have spotted are the DDR4 ICs. ConnectX-5 IC is the MT27808A0-FCCF-EV which is also on the 8GB variant.

MNV303212A-ADIT_Main_IC_Closeup

MT_0000000141, MNV303212A-ADAT_C18_Ax, 16.22.0264

Yet another variant. OK.

I've not been able to get a link up on any OS with these cards

With the original 16.22.0264 firmware or the 8GB ADLT firmware?

I was unable to get a Direct-Attach Cable (DAC), an Active Optical Cable (AOC), or 1GbE optical SFP modules to show up under mlxcables using either firmware. They all work fine on the 8GB ADLT variant.

Update: The following does NOT work: The 4GB ADIT variant is listed as with Crypto. Disabling the crypto engine is worth trying. This post is helpful but I am guessing as these commands. Will try and report back.

sudo mst start
sudo mst status
sudo mlxreg -d /dev/mst/mt4119_pciconf0 --show_regs | cut -d " " -f 1 | grep "^[A-Z]" | grep -v "Available" | xargs -t -L 1 -I{} sudo mlxreg -d /dev/mst/mt4119_pciconf0 --show_reg {}
sudo mlxreg -d /dev/mst/mt4119_pciconf0 --reg_name CRYPTO_OPERATIONAL --get
sudo mlxreg -d /dev/mst/mt4119_pciconf0 --reg_name CRYPTO_OPERATIONAL --set "kek_size=0x00000000, wrapped_crypto_going_to_commissioning=0x00000000, wrapped_crypto_operational=0x00000000"
sudo mlxreg -d /dev/mst/mt4119_pciconf0 --reg_name CRYPTO_OPERATIONAL --get

I only have a few more days before my return window ... cannot use the NIC side

If your goal was a Dual 25GbE ConnectX-5 adapter then it does not look good. However, if your goal was to use the FPGA to process ethernet packets, be aware that is a significant project. The board is designed to use PCIe Peer-to-Peer for communication between the ConnectX-5 and the FPGA. I cannot promise anything but I am working on an OpenCAPI to QSFP module. I have a PCIe x8 Breakout and OpenCAPI SlimSAS 8x Breakout to test signals and pinouts and a QSFP Breakout to attempt 100GbE QSFP.

Applepi commented 1 year ago

I was able to get the cards working on non-12th Z690 machines, my Z87 and X570 boards worked just fine on the 16.22.0264 firmware. Only one link worked, they established a link at 40gbps (a bug maybe) but max speed in iperf3 was only 25gbps.

I only needed the dual 25GbE on my Z690 NAS PC so I'll likely just switch that with a quad port broadcom NIC. I will likely keep the cards even though they only establish a single link the FPGA side is super useful. I think there is some sort of issue with the 12th gen PCI-E controller and the Mellanox NIC because both my Z690 12900k and and Z690 12700k machines cannot properly initialize the NICs, I've not been able to find anyone else with this issue so it may be an issue between the old firmware and modern UEFIs.

mwrnd commented 1 year ago

@Applepi thanks for finding the 4GB MNV303212A-ADAT Product Brief(manualzz.com). The following diagram from it shows a direct connection between an SFP cage and the FPGA. I have seen this diagram before but thought it was a logical description of how the board can be used as it does not describe the 8GB ADLT boards. However, it may be a physical connection diagram for the ADAT boards. That would explain why my 4GB ADIT board with its original firmware only has one Ethernet interface. The second connects to the FPGA.

MNV303212A-ADAT_Product_Brief_Diagram

From the Mellanox presentation (_MellanoxPSI.pdf) the ADAT board is designed for pipeline packet processing. First the FPGA processes packets then the ConnectX-5.

MNV303212A-ADAT_mention_in_Mellanox_PSI

The 8GB MNV303212A-ADLT Product Brief shows both 25GbE connected to the ConnectX-5.

MNV303212A-ADLT_Product_Brief_Diagram

The 4GB MNV303212A-ADAT Product Brief also mentions it specifically supports "bump-in-the-wire" packet processing. It also states a Board Support Package (BSP) for the FPGA is available but I have little hope of finding it.

Where_is_the_MNV303212A-ADAT_Board_Support_Package_BSP

Seeing as how the ADAT/ADIT EoL Notification lumps the two boards together and given all the above, it is worthwhile to check if there is a direct connection between an SFP cage and the FPGA on the 4GB ADIT boards. If yes, it should be possible to port corundum to it which would turn it into a single-port 25GbE PCIe Ethernet Adapter. The ADAT board looks just like the ADIT board.

@Applepi, do you happen to have an SFP Loopback(DigiKey) and a Xilinx-Compatible JTAG Adapter or clone? We could try using the FPGA's GT Transceiver built-in test interface to look for the pins connected to the SFP Cage.

mwrnd commented 1 year ago

I was able to get the cards working on non-12th Z690 machines

It fails on my Intel 300-Series motherboard.

Trying to disable cryptographic functions on the 4GB ADIT did not work. It does NOT have any related registers. If curious, here are the mlxreg register listings for the MNV303212A-ADIT and the MNV303212A-ADLT.

The 16.22.1002 firmware does NOT work with innova2_flex_app from Innova_2_Flex_Open_18_12.tar.gz. I needed to update the firmware to the 8GB ADLT firmware to make use of the FPGA.

sudo ~/Innova_2_Flex_Open_18_12/app/innova2_flex_app -v

===============================================
 Verbosity:        1
 BOPE device:      None
 ConnectX device:  None
Cannot find appropriate ConnectX device

The PCIe Device Tree shows only one 25GbE interface sitting on the ConnectX-5's PCIe switch.

sudo lspci -vnn -t

-[0000:00]-+-00.0  Intel Corporation Device [8086:3e0f]
           +-1d.0-[01-04]----00.0-[02-04]--+-08.0-[03]----00.0  Mellanox Technologies Innova-2 Flex Burn image [15b3:0264]
           |                               \-10.0-[04]----00.0  Mellanox Technologies MT27800 Family [ConnectX-5] [15b3:1017]

sudo lspci -vvvnnxxx -d 15b3: log for the MNV303212A-ADIT running 16.22.1002 firmware. Its Product Name is Innova-2 IPsec EN Adapter.

mwrnd commented 1 year ago

I have REV3 Engineering sample cards that show up as MT_0000000141 MNV303212A-ADAT_C18_Ax Innova-2 Programmable EN Adapter

@Applepi do you have the MNV303212A-ADAT variant or is it just running that firmware?

Applepi commented 1 year ago

I have MNV303212A-ADAT REV A3 cards running the firmware I found on them. All the firmware I've flashed on them causes the links to not detect anything that is plugged in. They are all marked as ES.

I just ordered a loopback adapter for testing.

I have a clone JTAG SMT2 that I can hookup.

We can likely take Alex's Ethernet implementation once figure out which transceiver the SFP28 is wired to.

I can also confirm the BOOE driver was not working for my cards as well. It finds the NIC but not the FPGA.

I have a 4 port Broadcom BCM57504 to have a reference interface to test against.

Applepi commented 1 year ago

A thought that just occurred to me that if the ADIT / ADAT cards are advertised as a bump in the wire / crypto cards that they might have the 2nd port of the mellanox NIC connected directly to the FPGA but I'm not sure how we would verify that as we haven't been able to get any dual port firmware to get the CX-5 device to establish link.

mwrnd commented 1 year ago

ADIT / ADAT cards are advertised as a bump in the wire / crypto cards that they might have the 2nd port of the mellanox NIC connected directly to the FPGA

Yes, the ADAT Architecture Diagram from the Product Brief(manualzz.com) may be accurate. Gigabit Transceiver signals require inline DC-block coupling capacitors so the signals must be exposed somewhere on the PCB. If the ConnectX-5 could be forced to output on the TX line without an ethernet cable being detected it should be possible to find the coupling capacitors. Could also try corundum on every possible MGTREFCLK and transceiver pair on the FPGA.

Comparing the front and back of the 4GB ADIT and 8GB ADLT boards I found two oscillators populated on the 4GB ADIT board that are not on the ADLT. To minimize noise and jitter these are almost certainly placed as close as possible to their transceiver. @Applepi please confirm whether you see the same on the ADAT boards.

ADLT_vs_ADIT_Extra_Clock_Sources

These oscillators are on the Pin1 edge of the FFVE1517 package and the closest MGTs are the GTH Banks of the XCKU15P. That means quite a few MGTREFCLK banks will need to be tested.

XCKU15P_GTH_Banks

Applepi commented 1 year ago

Yes I have confirmed that my ADAT card has those oscillators as well. Looking at the DIFF pairs going between the FPGA and the CX-5 Device it looks as though there may be 16 DIFF pairs. So maybe between devices it's bigger than a 25gbps link.

mwrnd commented 1 year ago

The ADIT oscillators are labeled J161.13 DFS 7L. 161.13MHz is a standard 10G/25G Ethernet reference clock.

ADIT_Oscillators_J161 13_DFS_7L

CX-5 Device it looks as though there may be 16 DIFF pairs

I can confirm that both the ADLT and ADIT have PCIe x8 connections to the ConnectX-5 which means 8 differential pairs for both TX and RX or 16 total.

Here is a close-up of the ConnectX-5 IC on the Innova2 8GB ADLT. There are 8 differential pairs leading to the FPGA on the right, a 156.25MHz oscillator at top, but notice no obvious coupling capacitors on the left side toward the SFP connectors.

ADLT_ConnectX5_Front-Coupling_Capacitors_and_OSC

Here is a close-up of the ConnectX-5 IC on the Innova2 4GB ADIT. There are 8 differential pairs leading to the FPGA on the right, a 156.25MHz oscillator at top, but notice the 4 differential pairs leading to the SFP connectors on the left.

ADIT_ConnectX5_Front-Coupling_Capacitors_and_OSC

Similar situation for the underside of the ConnectX-5 IC on the Innova2 8GB ADLT. 8 differential pairs on the left going from the CX5 to the FPGA but none toward the SFP connectors on the right.

ADLT_ConnectX5_Back-Coupling_Capacitors

The underside of the ConnectX-5 IC on the Innova2 4GB ADIT has 8 differential pairs on the left going from the CX5 to the FPGA but also 4 diff pairs toward the SFP connectors on the right.

ADIT_ConnectX5_Back-Coupling_Capacitors

@Applepi please check whether the ADAT has such capacitor connections to the SFP connectors.

From the SFP Specification (INF-8074i), there is just one RX and one TX per module. There is a dual channel SFP-DD Spec but it uses a different connector. InfiniBand uses QSFP.

mwrnd commented 1 year ago

I just noticed the Innova2 8GB ADLT uses the MT28808A0-FCCF-EV while the Innova2 4GB ADIT uses the MT27808A0-FCCF-EV. The MT28808A0-FCCF-EV and MT27808A0-FCCF-EV ICs may use different coupling interfaces.

mwrnd commented 10 months ago

By testing unused Quads I found that the 161.13MHz oscillators are connected to pins V27 and Y27, Quads 129 and 130.

XCKU15P_FFVE1517_Banks

# MNV303212A-ADAT and MNV303212A-ADIT Only:
# SFP - CMAC X0Y0 and X0Y1
#
# GTY Quad 129 == Quad_X0Y2
# Y36=MGTYRXP0=CHANNEL_X0Y8,  W38=MGTYRXP1=CHANNEL_X0Y9,
# V36=MGTYRXP2=CHANNEL_X0Y10, U38=MGTYRXP3=CHANNEL_X0Y11
# Y27=MGTREFCLK0_P, W29=MGTREFCLK1_P
#
# GTY Quad 130 == Quad_X0Y3
# T36=MGTYRXP0=CHANNEL_X0Y12, R38=MGTYRXP1=CHANNEL_X0Y13,
# P36=MGTYRXP2=CHANNEL_X0Y14, N38=MGTYRXP3=CHANNEL_X0Y15
# V27=MGTREFCLK0_P, U29=MGTREFCLK1_P

# Transceiver Pins Currently Unknown, only Clock Pins known

#set_property PACKAGE_PIN Y27 [get_ports diff_clock_rtl_2_clk_p]
#create_clock -period 6.206 -name gt_clk2 [get_ports diff_clock_rtl_2_clk_p]

#set_property PACKAGE_PIN V27 [get_ports diff_clock_rtl_3_clk_p]
#create_clock -period 6.206 -name gt_clk3 [get_ports diff_clock_rtl_3_clk_p]

I plan to test Channels 0 and 3 of the Quads to see if any work with the following loopback.

Using the pinout of the Small Formfactor Pluggable SFP Transceiver INF-8074i Standard:

Small_Formfactor_Pluggable_SFP_Transceiver_INF-8074i_Pinout

I created a simple loopback with an SFP Connector and a Direct-Attach Cable. Only two short 30AWG wires(alt) were needed, RXp-to-TXp and RXn-to-TXn, as all GNDs are connected together in a module or cable.

SFP_DIY_LoopBack

There are ~10nF=0.01uF capacitors in-line at both ends of a Direct-Attach Cable for about 5nF total.

SFP_In-Line_Coupling_Capacitor_Check

mwrnd commented 8 months ago

I got DDR4 working on my Innova2 4GB ADIT board.

@Applepi please test your ADAT board with the innova2_4gb_adit_xdma_ddr4_demo project when you have some spare time.

mwrnd commented 6 months ago

Using the DIY SFP Loopback and the xcku15p_ffve1517_GTY_IBERT project I was able to determine that the second SFP Port on the MNV303212A-ADIT is connected to Quad 129 Channel X0Y10.

SFP_DIY_LoopBack

With the SFP Loopback connected to SFP Port 2 (closest to PCIe card edge):

Innova2 4GB MNV303212A-ADIT SFP Port2

IBERT is successful:

Innova2 4GB MNV303212A-ADIT IBERT SFP Port2 X0Y10 has Link

With the SFP Loopback connected to SFP Port 1:

Innova2 4GB MNV303212A-ADIT SFP Port1

There are no signal links:

Innova2 4GB MNV303212A-ADIT IBERT SFP Port1 No Links