Closed jeehoonkang closed 4 years ago
Would there be some kind of a leaderboard so that we can compare against others' performance?
@cyron1259 good idea! we will soon prepare for a leaderboard.
As the whole compiler will be run on the final competition, I want to fuzz each optimization pass. But the fuzzer does not support it. Can you make options to fuzz the optimization pass?
Also, can you provide the command line argument for the compiler that will be used in the competition?
@hestati63
Clarification: you need to observe the LP64D
calling convention: https://github.com/kaist-cp/cs420/issues/209#issuecomment-643937143
IMPORTANT UPDATE on FINAL PROJECT
bench
directory, execute make run
. Then it will build your compiler, build benchmark codes, run them, and measure the elapsed CPU cycles. The average is your score (lower is better).gg.kaist.ac.kr
submission link so that you can run the benchmark codes on the SiFive HiFive Unleashed RISC-V machine.Can you notify a specific deadline that you finalizes the benchmark codes?
IMPORTANT: you should use la
pseudo instruction instead of HI20
, LO12
pair when obtaining the address of the global variable.
We create a shared object using the assembly code to check the performance of the compiler on the final project.
However, the relocation function HI20
and LO12
can not be used when making a shared object.
Instead, you can generate a shared object normally by using la
instruction.
So, please use la
pseudo instruction instead of HI20
, LO12
pair like below:
# before
lui a5,%hi(nonce)
lw a5,%lo(nonce)(a5)
# after
la a5,nonce
IMPORTANT:
I just uploaded the final project grader: https://github.com/kaist-cp/kecc-public/commit/542535fbd66c2ce4c0ae12b7b9b7fe135fb79d68 Please do whatever you want to improve cd bench; make run
's "[AVERAGE]" score (lower is better). We recommend you to read driver.cpp
.
@hestati63 Sorry for uploading the grader late. It's now finalized.
You'll upload your entire src
directory. Please run ./scripts/make-submissions.sh
and final.zip
is the file you'll upload to gg (TBA).
Hi. Could you let us know the scores of some reference compilers (for example, gcc -O0 and gcc -O1)? Currently, it is hard to know whether my implementation performs well or not by only seeing the score. Also, if one tries to challenge gcc / clang -O1, those scores can be good targets.
@Medowhill make run-gcc
will evaluate GCC with the optimization flag -O
for the same benchmark. You can easily change Makefile
to evaluate gcc -O0
and gcc -O1
as well.
Thank you! I didn't notice that.
When will be gg grader ready? As qemu uses binary translation, the cycle looks like just dependent on the number of instructions.
@hestati63 I'm trying to provide the grader by tomorrow. Sorry for delay.
FYI, gcc -O
's result is as follows:
[exotic_arguments_struct_small] 52
[exotic_arguments_struct_large] 77
[exotic_arguments_struct_small_ugly] 34
[exotic_arguments_struct_large_ugly] 138
[exotic_arguments_float] 18
[exotic_arguments_double] 19
[fibonacci_recursive] 52089252
[fibonacci_loop] 1640
[two_dimension_array] 72229
[matrix_mul] 373849
[matrix_add] 53248
[graph_dijkstra] 78160627
[graph_floyd_warshall] 151599746
[fibonacci_recursive] 52089692
[fibonacci_loop] 1787
[two_dimension_array] 74329
[matrix_mul] 372201
[matrix_add] 57362
[graph_dijkstra] 79328213
[graph_floyd_warshall] 151565586
[fibonacci_recursive] 52048084
[fibonacci_loop] 1754
[two_dimension_array] 72840
[matrix_mul] 377440
[matrix_add] 55333
[graph_dijkstra] 78468837
[graph_floyd_warshall] 151555926
[fibonacci_recursive] 52089934
[fibonacci_loop] 1759
[two_dimension_array] 72798
[matrix_mul] 372444
[matrix_add] 52443
[graph_dijkstra] 75623586
[graph_floyd_warshall] 151648428
[fibonacci_recursive] 52082904
[fibonacci_loop] 1755
[two_dimension_array] 72791
[matrix_mul] 373361
[matrix_add] 54438
[graph_dijkstra] 76326790
[graph_floyd_warshall] 151566425
[fibonacci_recursive] 52048784
[fibonacci_loop] 1782
[two_dimension_array] 72896
[matrix_mul] 379304
[matrix_add] 52175
[graph_dijkstra] 76327618
[graph_floyd_warshall] 151525424
[fibonacci_recursive] 52046427
[fibonacci_loop] 1758
[two_dimension_array] 72529
[matrix_mul] 370371
[matrix_add] 53737
[graph_dijkstra] 77295262
[graph_floyd_warshall] 151636428
[fibonacci_recursive] 52042896
[fibonacci_loop] 1775
[two_dimension_array] 72726
[matrix_mul] 376777
[matrix_add] 55529
[graph_dijkstra] 75582433
[graph_floyd_warshall] 151547499
[fibonacci_recursive] 52043659
[fibonacci_loop] 1896
[two_dimension_array] 72959
[matrix_mul] 370099
[matrix_add] 53497
[graph_dijkstra] 75628374
[graph_floyd_warshall] 151557803
[fibonacci_recursive] 52043419
[fibonacci_loop] 1780
[two_dimension_array] 72684
[matrix_mul] 373757
[matrix_add] 57870
[graph_dijkstra] 79321067
[graph_floyd_warshall] 151603631
[AVERAGE] 1.06947e+06
IMPORTANT: you should use
la
pseudo instruction instead ofHI20
,LO12
pair when obtaining the address of the global variable. We create a shared object using the assembly code to check the performance of the compiler on the final project. However, the relocation functionHI20
andLO12
can not be used when making a shared object. Instead, you can generate a shared object normally by usingla
instruction.So, please use
la
pseudo instruction instead ofHI20
,LO12
pair like below:# before lui a5,%hi(nonce) lw a5,%lo(nonce)(a5)
# after la a5,nonce
@cmpark0126 I am currently having trouble understanding the la
instruction.
Does la
return the address of the label, or the data inside that address?
@cmpark0126 I am currently having trouble understanding the
la
instruction. Doesla
return the address of the label, or the data inside that address?
The address of the label
Should we use the la-instruction only for the Nonce-object or for all global variables?
@christofides You need to use la
instruction for all global variables.
@cmpark0126 I am currently having trouble understanding the
la
instruction. Doesla
return the address of the label, or the data inside that address?The address of the label
So if it loads the address, we need also add lw to actually load the value of the variable? I.e.:
la a5, nonce
lw a5, a5
Edit: Another question, can LA be used to get the address of also floating point variables? (I assume this is the case but want to make sure)
@christofides
la a5, nonce
lw a5, 0(a5)
la
instruction for a global variable whose type is floating-point.
In turned out that we cannot physically gather for the final exam. So I decided not to take the final exam. Instead, as the substitute task, we'll have a performance competition as the final project.
For the competition, you’ll submit your entire compiler. Predefined benchmark programs will be compiled and then executed on Hifive Unleashed (the first Linux-bootable RISC-V development board), which is sponsored by SemiFive. If any of the results are wrong, you’ll be disqualified. The geometric average of the number of CPU cycles will be compared among students’ compilers and
clang -O1
.Please do whatever you can to reduce the number of cycles, e.g., by implementing more optimizations or by improving your asmgen with a better register allocation algorithm.
If your compiler is better than
clang -O1
, you’ll get A#. If your compiler is better than those of most students, you’ll get A+. Depending on the performance of your compiler, you'll get some bonus.