shail-vaidya / k-furthest-neighbors

Repository for UCSD ECE284 project
0 stars 0 forks source link

k-furthest-neighbors

Repository for UCSD ECE284 project

Project: Weight- and output-stationary reconfigurable 2D systolic array-based AI accelerator and mapping on Cyclone IV GX

Part1. Train VGG16 with quantization-aware training (10%)

Part2. Complete RTL core design connecting the following blocks: (5%)

Part3. Test bench generation to run following stages: (20%)

Measure of success:

Part4. Mapping on FPGA (Cyclone IV GX EP4CGX150DF31I7AD) (10%) (More details will be given in upcoming classes)

Part5. Weight-stationary and output stationary reconfigurable PE (20%)

Part6. +alpha (20% + 5% bonus)

Part7. Poster and Report (15%)

NOTE: Report needs to be submitted while you do not need to submit the poster. The description in this page could be updated for better clarity later. FAQ will be maintained below:

The final deliverables are as follows: One PDF file containing your final report. The report should clearly address each step you took during the design process and explain your innovative techniques for the alpha part (attach any necessary figures/screenshots of your codes). To ensure receiving full credit for each step, include any necessary data to measure your success in achieving the required goals. There is no format requirement for the final report, but the maximum page limit is 5. One zip file includes one folder for your verilog codes and one folder for your notebook files. You do not need to upload your VGG model.

The Deliverables are DUE on Dec 14th at 11:59 PM. The last day to submit the deliverables is Dec 18th with the Late Submission penalty of 20% being applied as each day passes.

FAQ:

  1. Some techniques that I am implementing are hard to show the benefit by Quartus Prime. How can I quantify?

    • Indeed, some of your techniques cannot be measured through the tools given in this course. In such a case, please quantify in a reasonable way, e.g., calculate your benefit theoretically or through any other experiment to prove it. Or, search for a related paper to estimate the benefit in a similar situation.
  2. Do I need to prepare both posters and separate slides for the presentation?

    • No, only a poster is needed. You are supposed to explain with your poster figures.
  3. What is a corelet?

    • Corelet includes all the blocks except core, e.g., ofifo, L0, PE array. For part4, only corelet (not core) is required to be implemented.
  4. Given the target function and +alpha part, may I edit the ports of core?

    • Yes, the core and tb are just a template to help students. Feel free to edit.
  5. Memory size can be modified?

    • yes.
  6. How to use Quartus?

    • Install guideline is on Pages/Course resources tab.
  7. May I create my own dual port memory ?

    • Sure.
  8. As we are processing 64 nij indices, our L0 and OFIFO should have 64 depth ?

    • No, while it pops out, it can receive the new contents at the same time. so, your depth should not be that high.
  9. What is the final output of the hardware simulation and where it should be stored ?

    • summation of 9 vectors should pass the ReLU. The output of ReLU is the final result. In the hardware, it should be finally stored in psum mem.
  10. Any suggestion on poster size ?

    • 48 inch X 36 inch preferred. I also suggest that the file be the size. On powerpoint, you can change the size by going to Design->Slide size-> custom slide size. Otherwise, the printing center had some difficulty in printing.