ufrisk / pcileech-fpga

FPGA modules used together with the PCILeech Direct Memory Access (DMA) Attack Software
913 stars 206 forks source link

Advice Request: Ported project to new board and 1Gbit Ethernet, how can I make it faster? #160

Closed Gbps closed 8 months ago

Gbps commented 9 months ago

Hi Ulf!

Thanks for spending all the time supporting this project! It's super awesome.

I ported the NeTV2 codebase to one of my FPGA development board, the Alinx AX7A035 found here: https://alinx.com/en/detail/496. It is also a XC7A35T board.

Since this board doesn't have a FT601, I ported the NeTV2 ethernet code from RMII -> RGMII. I ran this benchmark and got this:

gbps@testbench:~/pcileech$ sudo ./pcileech -device rawudp://ip=10.0.0.64 benchmark

================ PCILEECH BENCHMARK START ================
READ   8 B       958 reads/s     7 kB/s
READ 128 B       958 reads/s   119 kB/s
READ 512 B       958 reads/s   479 kB/s
READ   4 kB      944 reads/s     3 MB/s
READ  64 kB      519 reads/s    32 MB/s
READ   1 MB       31 reads/s    31 MB/s
READ  16 MB        2 reads/s    35 MB/s
================ PCILEECH BENCHMARK FINISH ================
BENCHMARK: WARNING! Read speed is slow.
           USB connection most likely at USB2 speed.
           Check port/cable/connection for issues.

This is naturally about 5x faster than the NeTV2's 7MB/s. I just finished porting it a moment ago so I haven't began to optimize anything here. Since you would likely know best, do you happen to have any ideas of where to look first for bottlenecks? Perhaps some FIFO sizes for eth?

Thanks! -Gbps

ufrisk commented 9 months ago

It may be issues in your fpga design, or more likely there would be issues in the leechcore.dll/so file as well. Timeout and delays tuned for the slower NeTV2.

Of interest are functions named: DeviceFPGA_UDP_* and the NeTV2 profile. I don't know if this is your issue, but it's where I'd start looking at least.

Gbps commented 8 months ago

Hey, as an update to this, I did play around with the leechcore delays and sizes to no real effect. I dumped some UDP traffic and did some analysis on the throughput and the UDP stream was achieving nearly 100% of the 1G ethernet line rate.

This lead me to do some overhead calculations to see what the actual maximum throughput of the system could be. For my target system, I was only getting 64 bytes per CplD, so after calculating out the TLP and TCP/IP framing overhead I see that the maximum memory throughput is essentially already being achieved here.

Therefore, the only way to improve on this would be to upgrade to 2.5G or 10G ethernet.

Thanks again!

Gbps commented 8 months ago

Also, would you be interested in taking a PR for this Alinx development board? It uses the RGMII core that was recently published by fpga-cores.

ufrisk commented 8 months ago

I'd be very happy to include a link to your repo from the main project page. I could add it as a "community supported device" (similar to legacy supported device table, but separate) or if you wish to submit a PR for this link.

Accepting the PR with the actual FPGA project into the main project would also increase my maintenance burden with new updated releases and such and I'm not too keen on that.

Gbps commented 8 months ago

@ufrisk Yeah that's completely reasonable. I'll continue doing some testing on it and get back to you.