phdussud / pico-dirtyJtag

MIT License
303 stars 44 forks source link

Processing SVF files directly #23

Open gsteiert opened 6 months ago

gsteiert commented 6 months ago

Loading devices from SVF files with openFPGALoader take forever. How difficult would it be to take the SVF parsing from openFPGALoader and run it locally on the RP2040? It would be nice to be able to send a .SVF file into the RP2040 through a virtual UART and process it locally. Could we use the SVF parsing from openFPGALoader, or should we look for one that already runs on an MCU?

phdussud commented 6 months ago

Can you provide some concrete data? How does it compare with bin programming? What JTAG frequency do you use? Do you know where the bottleneck is? Can you share your SVF file? It turns out that the author of openFGPALoader wasn't all positive about enabling SVF support for every FPGA targets because SVF is inherently slow compared to bin programming. The reason is that some delays are typically inserted in places due to programming delays imposed by the hardware. In bin mode, openFPGALoader loops around a read register status until the device is ready. SVF do not allow this so a conservative delay is typically introduced which is always slower than the probe loop.

gsteiert commented 6 months ago

The board I am using (MAX10 10M08 Evaluation Kit) does does not have bin support for comparison, but the SVF programming takes several minutes for a small device. I was running at the fastest 16Mbps supported by pick-dirtyJtag. There may be some potential optimizations like skipping verification of the image. The way SVF works, it has to check the TDO data for each transaction. I expect it would be quicker to check the data on the MCU than to send the TDO data back to openFPGALoader for processing. The other advantage could be the elimination of openFPGALoader. You could send the SVF file directly to the UART and do all the processing on the pico board.

I may attempt this myself. I am mostly wondering if you think the openFPGALoader SVF parsing could be ported to the RP2040 since you seem to be familiar with both. Or would you look for another SVF implementation to port?

phdussud commented 6 months ago

I am guessing you are right. SVF checking TDO may be the reason it is slow. However, there must be a generation mode where the generated SVF does not do this. For sure, this shortened Lattice file does not and loads 300KB in a couple of seconds. bitstream-shortened.zip As far as SVF parsers go I only know the one in openFPGALoader and it is C++ based. You will have to add the right C++ support libraries. It isn't anything that I would be interested in merging back into this project because it is too specific of a case to be interesting to most users. I would be glad to direct people who could benefit from it to your repo. Good luck!

luyi1888 commented 5 months ago

I'm not use FPGA at all. But i am facing the same situation. I'm using the OpenOCD, In my case, for read every 4bytes memory, At least, It need send 2 command queues to Pico. I guess it should be 1 for TAP_STATE_MOVE + write IR and 1 for TAP_STATE_MOVE + read DR. The OpenOCD not submit read command in bulk.

In the screenshot, you can see Pico need about ~4ms to processing every command queue. It should explain why dump speed is 0.15kb/s. Of course, I am using the VMware and the code on Pico is not optimized. But, it will not help too much even the problem is solved. Because of USB send frame every 1 ms. 4ms -> 1ms, 0.15kb -> 0.6kb, not enough.

cjtag

luyi1888 commented 5 months ago

Besides of make openFPGALoader/OpenOCD running on Pico, I think the most generic way is use MPU to do this. So it no need for modify OpenOCD to really submit command in bulk. Just make a JTAG adapter driver as usual. ARM core running Linux and OpenOCD, and make MCU core as I/O processor. Use shared memory to exchange data between ARM and MCU. MCU use GPIO bit-bang and SPI to do JTAG. Like the original dirtyJTAG. I think the shared memory will eliminate the processing/transfer time of every command queue.

I'm already do some test on Rockchip RV1106, MCU can do bit-bang at 1 MHz. Compare to ST/TI, the board is Pico sized, cheap and easy to get.

luyi1888 commented 5 months ago

How about J-Link and other commercial debugger to handle this? If I use OpenOCD + J-Link, the same result as Pico? J-Link GDB Server, huge improvement? I guess they really submit command in bulk.

phdussud commented 5 months ago

About Jlink and the Jlink GDB server. The microprocessor in the Jlink device handles all of the chatty traffic of the debug protocol. It only sends the results to the gdb server on the USB line. The reason you get so poor USB utilization is that you need a USB turnaround (send, the receive) every 64 bits of communication between OpenOCD and the adapter. OpenFPGALoader send binary without the need to have a turnaround and I see that the USB bus (FS) is almost totally used up when I use pico dirtyJtag

phdussud commented 5 months ago

About Jlink and the Jlink GDB server. The microprocessor in the Jlink device handles all of the chatty traffic of the debug protocol. It only sends the results to the gdb server on the USB line. The reason you get so poor USB utilization is that you need a USB turnaround (send, the receive) every 64 bits of communication between OpenOCD and the adapter. OpenFPGALoader send binary without the need to have a turnaround and you see the USB bus is totally used up

luyi1888 commented 5 months ago

Got it, So the things like Black Magic Probe will be fastest. Thank you for your explanation.