Open ttsiodras opened 1 year ago
This is a valid issue I have been exploring as well.
The Innova-2 shows up under Linux as being composed of a bridge connecting two ethernet controllers and the FPGA to the host.
The Innova-2 Product Brief specifically states it has a PCIe switch.
A team associated with Nvidia Networking has the FlexDriver Project which can supposedly do direct NIC-to-FPGA Bump-In-The-Wire processing. However, to make use of it you must "acquire FlexDriver IP (src/flc.dcp
) from NVIDIA Networking". It appears to be a proprietary feature. I looked through the two FlexDriver demos, the flexdriver-zuc encryption demo and the flexdriver-iot-auth packet processing offload demo but the code to communicate with the ConnectX-5 seems to be in src/flc.dcp
.
It is possible to disconnect the Innova-2 from PCIe using setpci
:
sudo setpci -s 02:08.0 0x70.w=0x50
Then reconnect it with sudo setpci -s 02:08.0 0x70.w=0x40
. Some combination of bits that connect the FPGA to the ConnectX-5 probably exists.
I took a break from playing with networking to get RISC-V working on the Innova-2.
We can try coordinating messages to the Nvidia Networking Forums. Or, if you have an in at Nvidia, please post anything you discover and are allowed to forward.
Thanks for your feedback, much appreciated!
If you post a question about this in any NVIDIA forum, ping me here and I will join in.
In the meantime I will continue looking - and keep you posted if I find anything.
I have asked about ConnectX-5 Ethernet to FPGA Direct Communication on the Nvidia SoC and SmartNIC Forum.
I have also created GitHub Issues on the flexdriver-iot-auth and flexdriver-zuc projects.
Seems unlikely Nvidia will provide any support for this but I did some research and came across some hopeful notes.
FlexDriver uses PCIe Peer-to-Peer:
PCIe Peer-to-Peer is supported by the ConnectX-5:
Xilinx supports PCIe Peer-to-Peer via XRT. Check out the Vitis p2p_bandwidth and p2p_simple demos.
Mellanox demonstrated RDMA-to-PCIe and have the nv_peer_memory project for GPU-to-InfiniBand Peer-to-Peer.
Unless there is some secret register that needs to be set to enable PCIe Peer-to-Peer in the ConnectX-5 PCIe switch and Ethernet Controllers, recreating FlexDriver should be possible. Or, it may be that the root complex or motherboard PCIe switch needs to support Peer-to-Peer. In any case, this seems like a multi-month project.
More references: XDMA Peer-to-Peer question, PCIe Performance Tuning, some ConnectX PCIe Registers, Ethernet to NUMA Node Mapping, Linux PCIe Peer-to-Peer DMA Support.
If you need a solution sooner rather than later there are boards with direct connections between an FPGA and SFP28/QSFP modules. The Alveo U25N/U25 or U200 (DigiKey/eBay) or U55C (DigiKey) are most similar to the Innova-2 depending on your RAM requirements. If you require an OpenCAPI connector, look into the Alpha-Data ADM-PCIE-9V3. The corundum project supports the U200 and ADM-PCIE-9V3. The Innova-2 at around $200 is great value if you want to experiment with PCIe and FPGAs but Nvidia does not seem too keen to support it. It is based on Another coMpany's proDuct and they seem focused on their BlueField DPU accelerators.
The main MT28800 PCIe switch hosts two PCIe bridges/switches, one for the FPGA and the other for the dual MT27800 Ethernet Controllers.
None of the Mellanox-specific registers seem related to PCIe Peer-to-Peer. The following dumps all of them:
sudo mst start
sudo mst status
sudo mlxreg -d /dev/mst/mt4119_pciconf0 --show_regs | cut -d " " -f 1 | grep "^[A-Z]" | grep -v "Available" | xargs -t -L 1 -I{} sudo mlxreg -d /dev/mst/mt4119_pciconf0 --show_reg {}
Many thanks for your valuable input! I will investigate and see what I can do.
PCIe Peer-to-Peer DMA should work on any motherboard with Resizable BAR and Above-4G Memory Decoding capabilities. However, it is supposedly buggy.
GPUDirect is supported on Keplar or later Tesla and Quadro GPUs using GPU driver version R470 or later.
Testing the nv_peer_memory project would give some confidence that this is at all possible but the PCIe Peer-to-Peer communication would occur within the motherboard's Root Complex or a PCIe switch.
As I understand the FlexDriver project, PCIe Peer-to-Peer occurs within the ConnectX-5's bridge/switch.
That means the bigger hurdle is enabling PCIe Peer-to-Peer within the FPGA which requires a Vitis DFX Shell. The Alveo U200 is known to support PCIe P2P. It would be great if I could just try re-targeting the U200 Deployment Target Platform but unfortunately the Development Target Platform comes as an XSA and will require some effort to attempt to use.
I have found two Vitis shells with some source code, Xilinx's open-nic-shell and U200 to AWS-F1. They may support PCIe P2P.
Another avenue to explore is PCIe P2P between GPU and FPGA which seems more mature but there is no official support.
Again, if you are working on a near-term project, the Alveo U200 (DigiKey/eBay) has two QSFP28 (100GbE) cages connected directly to an XCU200 = XCVU9P
FPGA. You could then use 100G QSFP28 to 4x25G SFP28 Direct Attach Breakout Cables to get up to 8 25GbE interfaces communicating through Programmable Logic.
Follow-Up: The 100GbE CMAC Block does not appear to support port splitting so turning 2 QSFPs into 8 SFPs may not be possible.
The open-nic-shell
project uses pins M10/M11 and T10/T11 for the QSFP clocks on the U200 which are on Banks 232 and 230.
The two QSFP connectors are therefore on a single Super Logic Region (SLR). I believe you cannot cross an SLR region to connect a CMAC Block to Gigabit transceivers which implies there is a maximum of 3 integrated ethernet interfaces on the U200. Eight 25GbE would require a DIY MAC layer in the programmable logic.
I have made progress figuring out what doesn't work. I got a Tesla K80 GPU (SM_3.7) and have been trying to demonstrate P2PDMA. The K80 is no longer under active development so I am stuck in a version and dependency pit. My setup notes.
For example, simpleP2P officially requires SM_5+. gdscheck.py
is part of CUDA 11.4 (last version to officially support the K80) and it asks for CUDA Compute Capability >=6 which is Pascal or later. DPDK only officially supports Ampere, Turing, and Volta (SM_7+).
Motherboard, GPU, NIC, and software all have to work together and thus far I am only convinced the GPU I am using supports PCIe P2P (K80 is two GPUs connected via a PEX 8747). I intend to spend some more time with the software. A known-working server or PCIe Switch Card is too much right now. I am using an Intel B360 motherboard which supposedly supports P2P (search for peer
in the datasheet).
I managed to get open-nic-shell to synthesize and implement but it does not show up on the PCIe bus. I believe this is an issue with Vivado 2021.2 as I have been unable to get even the simplest QDMA demo to work with the XCKU15P. I will try again.
aws-fpga is unfortunately usable only with the AWS F1 instances and the Alveo U200 as its core shell comes as an encrypted Vivado checkpoint. However, it may still be useful to spin up an F1 instance and run the cl_dram_dma demo to see how P2PDMA looks and functions on a working setup.
There are two CMAC 100G MAC blocks adjacent to the OpenCAPI GTY Quads 131 and 132 so it should be possible to implement dual QSFP modules using the SlimSAS OpenCAPI connector. Refer to Figure 1-99 on Pg#134 of ug575 UltraScale+ Packaging and Pinouts.
Unfortunately I am having problems getting my PCIe-to-SlimSAS Adapter working. The OpenPower AAC specification I used has a different pinout to the ADM-PCIe-9V5 which is the only OpenCAPI pinout I have seen.
I mangled and reworked my PCIe-to-SlimSAS Adapter but am unable to get PCIe to work even at Gen1 x1.
Clock and Reset work well and I can successfully run IBERT and see a decent eye diagram. I believe the TX pins are way off on the adapter. I intend to design a SlimSAS breakout to figure out the signal locations. That or my board has broken OpenCAPI transceivers.
If I manage to get the PCIe Adapter working I will go ahead and complete a SlimSAS-to-QSFP Adapter. It may even be possible to power it from a PCIe x1 slot:
There is any news, about send/reiceve pkts directly from the FPGA? I'don't understand why it's called flexnic and there isn't a way to use the network card :) I've bought 12 of these board :)
@ciclonite which version of the board do you have (what is the DDR4 FBGA Code)? The six board that i've unpacked have these chip:
Tomorrow i'll look on the other.
Great! you are doing an immense work! I've write to nvidia for the FlexDriver IP but haven't reply.
Hi.
I saw your interesting projects on the Innova2 - and wanted to ask you: Have you used the Innova2 FPGA to control the network chip's (ConnectX-5) ingress/egress packets?
I want to hack something that allows me to send pkts directly from the FPGA, but can't find any example for how to do this.
Sorry for opening a non-issue here - but I couldn't find any other way to communicate with you.