goossens-springer / goossens-book-ip-projects

this repository contains all the ip projects presented in the HLS/RISC-V/Computer Architecture book written by Goossens and published by Springer
21 stars 8 forks source link

Question: Why do you use #pragma HLS PIPELINE II=7? #6

Closed ultraTester333 closed 2 weeks ago

ultraTester333 commented 2 weeks ago

I have a question about the non-pipelined version: Why do you use 7 FPGA cycles? Because when I run the design on my FPGA (Ultra96-V2), I get an interval of 5. I can use #pragma HLS PIPELINE II=4 instead of #pragma HLS PIPELINE II=7, and it works, but I'm confused about why. For example, #pragma HLS PIPELINE II=2 is not possible, as the synthesis report says the interval is 4. Could you please explain this? I want to understand what happens when I make the interval shorter. I am using Vitis HLS 2022.1.

goossens-springer commented 2 weeks ago

Hello,

This is probably related to your FPGA, which is different from mine (in the Pynq boards there is a xc7z020 FPGA ; on the Ultra 96V2, you have a ZU3EG A484 FPGA ; for example, 85K logic cells versus 154K). As the resources inside the FPGA are not the same, the synthesizer can map the computations differently, leading to possibly less time to process each RISCV instruction. To really figure out where the differences rely, you should compare the schedules (when the fetch is done, then the decoding and later, the execution : at what cycle does the operation starts, at what cycle does it end).

Anyway, the non pipelined processor is not intended to be optimized for time. It is intended to be optimized for area (i.e. as small as possible). The pipelined version and the multicycle pipeline one should be mapped on two cycles on your board, like on mine, maybe with a less critical path.

Please let me know about your future experiences on the following designs in the book.

By the way, I am still on the way to implement the last chapter experimentations with the 2024.1 version of Vitis. I had a few difficulties for chapter 14 (the last in the book about handling leds and buttons on a board) but it is now solved. I'll be posting the result on the github soon.

Bernard.

De: "DarthN0b0dy16" @.> À: "goossens-springer" @.> Cc: "Subscribed" @.***> Envoyé: Vendredi 27 Septembre 2024 17:30:03 Objet: [goossens-springer/goossens-book-ip-projects] Question: Why do you use #pragma HLS PIPELINE II=7? (Issue #6)

I have a question about the non-pipelined version: Why do you use 7 FPGA cycles? Because when I run the design on my FPGA (Ultra96-V2), I get an interval of 5. I can use #pragma HLS PIPELINE II=4 instead of #pragma HLS PIPELINE II=7, and it works, but I'm confused about why. For example, #pragma HLS PIPELINE II=2 is not possible, as the synthesis report says the interval is 4. Could you please explain this? I want to understand what happens when I make the interval shorter. I am using Vitis HLS 2022.1.

— Reply to this email directly, [ https://github.com/goossens-springer/goossens-book-ip-projects/issues/6 | view it on GitHub ] , or [ https://github.com/notifications/unsubscribe-auth/A2PY5GY2XDAPVNFAGIYS7SLZYV2XXAVCNFSM6AAAAABO7MXKJGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGU2TGMRRGM3DEMQ | unsubscribe ] . You are receiving this because you are subscribed to this thread. Message ID: <goossens-springer/goossens-book-ip-projects/issues/6 @ github . com>

ultraTester333 commented 2 weeks ago

Thank you very much! :) That explains it. I was confused about why you used 7 FPGA cycles and not 6 FPGA cycles. With the non-pipelined version, I was able to reach 25 MHz on the Ultra96 with HLS PIPELINE II=4, compared to 14 MHz on your board with HLS PIPELINE II=7. This leads me to another question: Is the non-pipelined version a single-cycle design? I ask this because the #pragma PIPELINE makes me wonder.

goossens-springer commented 2 weeks ago

You must not confuse the FPGA cycle with the implemented processor cycle. In your question " Is the non-pipelined version a single-cycle design?", if "cycle" is related to the FPGA cycles, the answer is "No" because the instruction processing takes 7 FPGA cycles. If "cycle" is related to the processor cycle, the answer is "Yes". But in this case, the answer would always be "Yes", whatever the design. In a pipelined design, the processing of an instruction takes multiple processor cycles (latency), but if the pipeline keeps filled with one new instruction every cycle, its throughput is one instruction per processor cycle.

In the book designs, the non pipelined processor can be clocked as fast as 7 FPGA cycles (faster than that on your board, and even faster on an ASIC implementation). The pipelined designs can be clocked as fast as 2 FPGA cycles (faster on an ASIC).

The real run time of a program is computed from the main equation "number of instructions to be run average number of processor cycles per instruction duration of a processor cycle". If you compare two designs on the same run, the first term is constant and what makes the speed is the product of the two last terms. With the implementations in the book, you can compare the benefits of architectural improvements like pipelining or multithreading by comparing the respective runtimes of a fixed set of representative programs (a benchmark).

De: "ultraTester333" @.> À: "goossens-springer" @.> Cc: "Bernard Goossens" @.>, "Comment" @.> Envoyé: Vendredi 27 Septembre 2024 20:21:32 Objet: Re: [goossens-springer/goossens-book-ip-projects] Question: Why do you use #pragma HLS PIPELINE II=7? (Issue #6)

Thank you very much! :) That explains it. I was confused about why you used 7 FPGA cycles and not 6 FPGA cycles. With the non-pipelined version, I was able to reach 25 MHz on the Ultra96 with HLS PIPELINE II=4, compared to 14 MHz on your board with HLS PIPELINE II=7. This leads me to another question: Is the non-pipelined version a single-cycle design? I ask this because the #pragma PIPELINE makes me wonder.

— Reply to this email directly, [ https://github.com/goossens-springer/goossens-book-ip-projects/issues/6#issuecomment-2379826726 | view it on GitHub ] , or [ https://github.com/notifications/unsubscribe-auth/A2PY5G6DNQ3XKABYM55XSGDZYWO2ZAVCNFSM6AAAAABO7MXKJGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZZHAZDMNZSGY | unsubscribe ] . You are receiving this because you commented. Message ID: <goossens-springer/goossens-book-ip-projects/issues/6/2379826726 @ github . com>

ultraTester333 commented 2 weeks ago

Thank you very much. The difference between FPGA cycles and processor cycles confused me, which is why I asked the question, now i Understand it :). I measure the execution time using the library available in Vitis IDE: "xtime_l.h". Of course, I also output the values like nbi and nbc, but it's nice when the calculated values match the actual results. :)