pc2 / Aurora-HLS

Ready-to-link, packaged Aurora IP on four QSFP28 lanes, providing 100Gb/s throughput
Apache License 2.0
7 stars 3 forks source link

Software-controlled reset #13

Closed michaellass closed 4 months ago

michaellass commented 6 months ago

Start working on #10 by implementing the missing write logic in the AXI control interface

Most code is taken from https://github.com/Xilinx/Vitis-Tutorials/blob/dfc55741ecdb27e488cc5c82bdf1280c00463b98/Hardware_Acceleration/Design_Tutorials/05-bottom_up_rtl_kernel/krnl_aes/rtl/krnl_aes_axi_ctrl_slave.v which matches the read fsm that we already use here.

For now, introduce some crude testing by writing and reading a test register 1000 times. This can likely be removed in later commits.

michaellass commented 5 months ago

Bildschirmfoto 2024-04-05 um 17 10 46

It seems to do what it's supposed to do. However, there is still no way to reset the issue kernel in our example code. So, if data is stuck in there, it immediately fills the tx buffers again:

5: FIFO status before reset: FIFO tx prog full, FIFO tx almost full, FIFO rx prog full, 
4: FIFO status before reset: FIFO tx prog full, FIFO tx almost full, FIFO rx prog full, 
2: FIFO status before reset: FIFO tx prog full, FIFO tx almost full, FIFO rx prog full, 
1: FIFO status before reset: FIFO tx prog full, FIFO tx almost full, FIFO rx prog full, 
0: FIFO status before reset: FIFO tx prog full, FIFO tx almost full, FIFO rx prog full, 
3: FIFO status before reset: FIFO tx prog full, FIFO tx almost full, FIFO rx prog full, 
0: FIFO status after reset: FIFO tx prog full, FIFO tx almost full, FIFO rx prog empty, FIFO rx almost empty, 
1: FIFO status after reset: FIFO tx prog full, FIFO tx almost full, FIFO rx prog empty, FIFO rx almost empty, 
2: FIFO status after reset: FIFO tx prog full, FIFO tx almost full, FIFO rx prog empty, FIFO rx almost empty, 
3: FIFO status after reset: FIFO tx prog full, FIFO tx almost full, FIFO rx prog empty, FIFO rx almost empty, 
4: FIFO status after reset: FIFO tx prog full, FIFO tx almost full, FIFO rx prog empty, FIFO rx almost empty, 
5: FIFO status after reset: FIFO tx prog full, FIFO tx almost full, FIFO rx prog empty, FIFO rx almost empty, 

I'll keep experimenting a bit. Maybe an optional draining of the tx data source during reset makes sense.

papeg commented 5 months ago

I have added a test script, which should mimic the typical need for the software reset:

I thought, that just stopping or aborting the issue kernel should solve the mentioned problem. But doing this results in the following error messages, which i don't yet really understand:

using abort(): what(): failed to launch hw ctx execution buffer: Invalid argument using stop(): what(): Support for auto restart counters have not been implemented: Function not implemented

So the test script fails with wrong data, caused by still running issue kernel. Even when stopping the Issue kernel would be possible, it is probably still an problem, when there are FIFOs involved, which were configured in the link script between issue and aurora.

michaellass commented 4 months ago

I found some time to further work on this. I pushed a proof of concept that is able to run your test case successfully. However, the current solution to reset the dump kernel and to drain the data stuck in any AXI FIFOs is quite ugly. So this is just to document the current state.

michaellass commented 4 months ago

I rebased and cleaned up the commits a bit. Feedback is welcome.

papeg commented 4 months ago

The reset of the aurora core and the ability to now also pass runtime settings in general is a really nice improvement.

Regarding the need for draining the HLS kernels, i am not sure, if this is something, a user would implement, or if it will keep being more convenient to just reset the FPGAs. But this is probably the best we can get with the tooling right now..

Would it be possible to somehow move the reset on the dump side into an extra part, so it doesn't need to be integrated into the user logic? I need to take a deeper look, to fully understand the drain logic there.

Is the manual burst functionally necessary here or is it just for performance reasons?

michaellass commented 4 months ago

Yes, we came to a similar conclusion here. I will split this up into the general reset feature of the Aurora core, and add a separate example design that utilizes the interrupt and drain logic in the dump kernel, so that the default example design is not made more complex than needed.

The manual burst writes were required after changing the AXI stream reads into non-blocking reads. Without burst writes, throughput was reduced down to around 70 Gbps.

michaellass commented 4 months ago

I reduced this PR down to the three essential commits that allow resetting Aurora and optionally draining any FIFOs on the input side.

In this form it does not reset the parts behind Aurora, i.e., the dump kernel, so I think it also makes no sense to reset Aurora as part of the example design. A different design will be used to demonstrate this functionality.