sampsyo / cs6120

advanced compilers
https://www.cs.cornell.edu/courses/cs6120/2023fa/
MIT License
745 stars 158 forks source link

Project 4 Proposal: Cycle-Accurate Simulator for High-Level Synthesis #143

Closed zhouyuan1119 closed 4 years ago

zhouyuan1119 commented 4 years ago

What will you do?

High-level synthesis enables hardware designers to describe the functionality of their accelerators in software languages such as C/C++. While functional simulation can be performed in C, to get the accurate performance estimation the designer has to revert to RTL simulation which is significantly slower. In this project, I propose to implement a software cycle-accurate simulator for a high-level synthesis tool. The idea is not new. At least two papers (Lee TODAES'18, Chi FPGA'19) implemented similar tools. However, those tools are either incomplete or not publically available.

The first goal of the proposed simulator is to dump out the value of the interested variables in each cycle of the accelerator's execution. If time permits, the second goal is to further augment the simulator to dump out the input/output of every hardware functional unit (like a multiplier, an adder, etc. ).

How will you do it?

I will start by looking into Vivado HLS, the most popular high-level synthesis tool on the market. All the HLS compilation and code generation passes are written with LLVM. Fortunately, although Vivado HLS is a commercial tool, we have access to the following:

With such information, we can at least recover the timing behavior of the hardware and dump out cycle-accurate traces (the first goal). An operator-level diagram of the generated hardware is also available in the reports generated by the tool, but I have not figured out how to retrieve the binding between LLVM instructions and this diagram. If I manage to successfully connect all the information together, I can try to achieve the second goal.

The main effort of this project is to develop a transformation pass in LLVM to annotate the schedule and binding information into the optimized LLVM code. I will first go over the two papers I mentioned above to get a rough idea of how other people approach this. The annotation can be done in the form of print statements, or by maintaining a global data structure in the memory. The cycle-accurate trace of the interesting variables will be generated by compiling and executing the transformed LLVM code.

How will you empirically measure success?

I will evaluate the correctness of my simulator by checking the overall simulation latency against RTL simulation. If I manage to somehow approach the second goal, I will use some simple, human-readable examples to verify whether the variables are mapped to the correct operators. I will also measure the speedup of the proposed simulator over RTL simulation.

Since the project is highly challenging, I will evaluate the simulator on simple designs first, such as randomly-generated datapaths without control flow. Then I will use simple HLS benchmarks such as PolyBench.

Team members: @zhouyuan1119

sampsyo commented 4 years ago

Sounds great! As I said before, this sounds hard, but it sounds very cool if you can make it work. And I hope it will be open source. :smiley: