kaist-cp / cs420

KAIST CS420: Compiler Design (2023 Spring)
418 stars 27 forks source link

[Final Project] Performance Competition (substituting final exam, due: 8th of July) #168

Closed jeehoonkang closed 4 years ago

jeehoonkang commented 4 years ago

In turned out that we cannot physically gather for the final exam. So I decided not to take the final exam. Instead, as the substitute task, we'll have a performance competition as the final project.

For the competition, you’ll submit your entire compiler. Predefined benchmark programs will be compiled and then executed on Hifive Unleashed (the first Linux-bootable RISC-V development board), which is sponsored by SemiFive. If any of the results are wrong, you’ll be disqualified. The geometric average of the number of CPU cycles will be compared among students’ compilers and clang -O1.

Please do whatever you can to reduce the number of cycles, e.g., by implementing more optimizations or by improving your asmgen with a better register allocation algorithm.

If your compiler is better than clang -O1, you’ll get A#. If your compiler is better than those of most students, you’ll get A+. Depending on the performance of your compiler, you'll get some bonus.

cyron1259 commented 4 years ago

Would there be some kind of a leaderboard so that we can compare against others' performance?

jeehoonkang commented 4 years ago

@cyron1259 good idea! we will soon prepare for a leaderboard.

hestati63 commented 4 years ago

As the whole compiler will be run on the final competition, I want to fuzz each optimization pass. But the fuzzer does not support it. Can you make options to fuzz the optimization pass?

Also, can you provide the command line argument for the compiler that will be used in the competition?

jeehoonkang commented 4 years ago

@hestati63

jeehoonkang commented 4 years ago

Clarification: you need to observe the LP64D calling convention: https://github.com/kaist-cp/cs420/issues/209#issuecomment-643937143

jeehoonkang commented 4 years ago

IMPORTANT UPDATE on FINAL PROJECT

hestati63 commented 4 years ago

Can you notify a specific deadline that you finalizes the benchmark codes?

cmpark0126 commented 4 years ago

IMPORTANT: you should use la pseudo instruction instead of HI20, LO12 pair when obtaining the address of the global variable. We create a shared object using the assembly code to check the performance of the compiler on the final project. However, the relocation function HI20 and LO12 can not be used when making a shared object. Instead, you can generate a shared object normally by using la instruction.

So, please use la pseudo instruction instead of HI20, LO12 pair like below:

# before
lui     a5,%hi(nonce)
lw      a5,%lo(nonce)(a5)
# after
la      a5,nonce
jeehoonkang commented 4 years ago

IMPORTANT:

Medowhill commented 4 years ago

Hi. Could you let us know the scores of some reference compilers (for example, gcc -O0 and gcc -O1)? Currently, it is hard to know whether my implementation performs well or not by only seeing the score. Also, if one tries to challenge gcc / clang -O1, those scores can be good targets.

jeehoonkang commented 4 years ago

@Medowhill make run-gcc will evaluate GCC with the optimization flag -O for the same benchmark. You can easily change Makefile to evaluate gcc -O0 and gcc -O1 as well.

Medowhill commented 4 years ago

Thank you! I didn't notice that.

hestati63 commented 4 years ago

When will be gg grader ready? As qemu uses binary translation, the cycle looks like just dependent on the number of instructions.

jeehoonkang commented 4 years ago

@hestati63 I'm trying to provide the grader by tomorrow. Sorry for delay.

jeehoonkang commented 4 years ago
jeehoonkang commented 4 years ago

FYI, gcc -O's result is as follows:

[exotic_arguments_struct_small] 52
[exotic_arguments_struct_large] 77
[exotic_arguments_struct_small_ugly] 34
[exotic_arguments_struct_large_ugly] 138
[exotic_arguments_float] 18
[exotic_arguments_double] 19
[fibonacci_recursive] 52089252
[fibonacci_loop] 1640
[two_dimension_array] 72229
[matrix_mul] 373849
[matrix_add] 53248
[graph_dijkstra] 78160627
[graph_floyd_warshall] 151599746
[fibonacci_recursive] 52089692
[fibonacci_loop] 1787
[two_dimension_array] 74329
[matrix_mul] 372201
[matrix_add] 57362
[graph_dijkstra] 79328213
[graph_floyd_warshall] 151565586
[fibonacci_recursive] 52048084
[fibonacci_loop] 1754
[two_dimension_array] 72840
[matrix_mul] 377440
[matrix_add] 55333
[graph_dijkstra] 78468837
[graph_floyd_warshall] 151555926
[fibonacci_recursive] 52089934
[fibonacci_loop] 1759
[two_dimension_array] 72798
[matrix_mul] 372444
[matrix_add] 52443
[graph_dijkstra] 75623586
[graph_floyd_warshall] 151648428
[fibonacci_recursive] 52082904
[fibonacci_loop] 1755
[two_dimension_array] 72791
[matrix_mul] 373361
[matrix_add] 54438
[graph_dijkstra] 76326790
[graph_floyd_warshall] 151566425
[fibonacci_recursive] 52048784
[fibonacci_loop] 1782
[two_dimension_array] 72896
[matrix_mul] 379304
[matrix_add] 52175
[graph_dijkstra] 76327618
[graph_floyd_warshall] 151525424
[fibonacci_recursive] 52046427
[fibonacci_loop] 1758
[two_dimension_array] 72529
[matrix_mul] 370371
[matrix_add] 53737
[graph_dijkstra] 77295262
[graph_floyd_warshall] 151636428
[fibonacci_recursive] 52042896
[fibonacci_loop] 1775
[two_dimension_array] 72726
[matrix_mul] 376777
[matrix_add] 55529
[graph_dijkstra] 75582433
[graph_floyd_warshall] 151547499
[fibonacci_recursive] 52043659
[fibonacci_loop] 1896
[two_dimension_array] 72959
[matrix_mul] 370099
[matrix_add] 53497
[graph_dijkstra] 75628374
[graph_floyd_warshall] 151557803
[fibonacci_recursive] 52043419
[fibonacci_loop] 1780
[two_dimension_array] 72684
[matrix_mul] 373757
[matrix_add] 57870
[graph_dijkstra] 79321067
[graph_floyd_warshall] 151603631
[AVERAGE] 1.06947e+06
lomotos10 commented 4 years ago

IMPORTANT: you should use la pseudo instruction instead of HI20, LO12 pair when obtaining the address of the global variable. We create a shared object using the assembly code to check the performance of the compiler on the final project. However, the relocation function HI20 and LO12 can not be used when making a shared object. Instead, you can generate a shared object normally by using la instruction.

So, please use la pseudo instruction instead of HI20, LO12 pair like below:

# before
lui     a5,%hi(nonce)
lw      a5,%lo(nonce)(a5)
# after
la      a5,nonce

@cmpark0126 I am currently having trouble understanding the la instruction. Does la return the address of the label, or the data inside that address?

cmpark0126 commented 4 years ago

@cmpark0126 I am currently having trouble understanding the la instruction. Does la return the address of the label, or the data inside that address?

The address of the label

jesper-amilon commented 4 years ago

Should we use the la-instruction only for the Nonce-object or for all global variables?

cmpark0126 commented 4 years ago

@christofides You need to use la instruction for all global variables.

jesper-amilon commented 4 years ago

@cmpark0126 I am currently having trouble understanding the la instruction. Does la return the address of the label, or the data inside that address?

The address of the label

So if it loads the address, we need also add lw to actually load the value of the variable? I.e.:

la     a5, nonce
lw     a5,  a5

Edit: Another question, can LA be used to get the address of also floating point variables? (I assume this is the case but want to make sure)

cmpark0126 commented 4 years ago

@christofides

  1. Yes, you need to add load instruction to get the value of the global variable like below:
    la     a5, nonce
    lw     a5,  0(a5)
  2. Yes, you can use la instruction for a global variable whose type is floating-point.