kaist-cp / cs420

KAIST CS420: Compiler Design (2023 Spring)
423 stars 28 forks source link

KAIST CS420: Compiler Design

Logistics

Course description

Context

Compilers bridge the gap between human and machine. Human wants to easily express complex idea. On the other hand, machine understands only a few words (instructions) to be efficiently implemented in silicon. Compilers transform programs from a form suitable for human to easily express complex idea, to a form suitable for machine to efficiently execute. Since the gap between human and machine is fundamentally wide, compilers have been constructed and widely used since the beginning of the history of computing. Even, the first practical compiler predates the first practical operating systems (according to Wikipedia)!

In response to industry shifts, new compilers should be written and written again. First, human wants to express more and more complex idea, especially in the era of artificial intelligence and big data. Second, machine changes in response to physics (e.g. the ending of Dennard scaling and Moore's law) and industrial needs (e.g. Internet of Things and distributed systems). New compilers should be constructed to close the new gap between changing human and changing machine. For this reason, industrial needs for (and salary of) compiler engineers have been constantly high.

Goal

In this class, we will learn how to construct a compiler by actually building one. You are going to benefit from the provided skeleton code of a clean slate educational compiler--dubbed KECC: KAIST Educational C Compiler (think: KENS for networking or Pintos, xv6 for operating systems). We are going to discuss parsing only briefly, because the topic is assumed to be dealt with in CS322: Formal Languages and Automata. (You don't need to know parsing to take this course, though.) We will focus on translation from human-friendly form to machine-friendly form, and compiler optimizations. Specifically, we will discuss (1) how to transform a C program to an SSA-based intermediate representation (IR); (2) how to perform register promotion, static single assignment, global value numbering, and register allocation optimizations on the IR; and (3) how to transform an IR program to a RISC-V assembly program. KECC will provide a significant amount of skeleton code so that you can focus on the topic of this course.

We will also briefly discuss the recent trends of compiler construction. I see two crucial recent trends: scripting languages and parallelism. (1) Scripting languages like JavaScript and Python, unlike C, should be compiled (or interpreted) at run-time, and therefore, there is no clear distinction of compile- and run-time. It is a challenge in that compile time should also be optimized, but it is also an opportunity in that compile may gather and benefit from run-time information. (2) It is crucial to exploit the massive parallelism of modern applications like deep learning and high-performance computing (HPC), because they require so huge computation. Due to the complexity of workloads, their parallelism should be automatically discovered and exploited by compilers, which is a big challenge.

We will also briefly study the theory of compiler. We will focus on the correctness of compiler. In general, in what sense a compiler is correct, and how to prove it? Specifically, how to prove the correctness of KECC's transformations and optimizations? As it will turn out, this compiler correctness theory will greatly help you efficiently build your own compiler.

Textbook

Prerequisites

Tools

Make sure you're capable of using the following development tools:

Grading & honor code

Cheating

IMPORTANT: PAY CLOSE ATTENTION. VERY SERIOUS.

Homework (60%)

You will implement translations and optimizations on KECC. All homework submissions will be automatically graded online so that you can immediately see your score. If your compiler is correct and the generated assemblies perform comparably to those generated by gcc -O1, you're going to get A+ (if not A#) even if you miss the final exam.

Since compiler construction requires nontrivial undertaking, you're encouraged to ask questions on the homework in the issue tracker at the early stage of the semester.

Exams (40%)

The exams will evaluate your understanding of compiler theory.

Your physical apperance is required. If online participation is absolutely necessary, we'll use Zoom.

Attendance (?%)

Communication

Registration

Rules