sampsyo / cs6120

advanced compilers
https://www.cs.cornell.edu/courses/cs6120/2023fa/
MIT License
709 stars 157 forks source link

Project Proposal: C++ Infrastructure for Bril #404

Open xalbt opened 9 months ago

xalbt commented 9 months ago

What will you do?

We will add a C++ interface to Bril, which will include a parser and printer for JSON, types for each instruction, and easy program flow mutations. Our primary focus will be on performance, user-friendliness, and the capacity to expand the Bril ecosystem with blazingly fast C++ optimizations.

How will you do it?

We already have a basic C++ interface in place for completing lecture tasks and exercises. However, it lacks memory safety, ease of use, and has performance issues. We will retain certain aspects of the current implementation while mostly rewriting the framework to implement our goals.

First, we will redesign the Bril program types within our framework, with the primary goal of making our framework easy to work with optimizations. We already plan to structure our infrastructure around control flow graphs (CFG), as most optimizations operate at this level. Each function will be divided into basic blocks, and we will equip the program types with hooks to store analysis information for optimization purposes. Proper data encapsulation and hiding will be integrated into our types to ensure memory safety and resistance to implementation changes. This initial phase will be time-consuming, as we intend to approach it as a real-world software engineering effort, emphasizing planning.

Once the foundational framework is in place, we will iteratively test and enhance its usability for implementing optimizations. Our plan includes implementing various analyses and optimizations, as outlined in the next section. This process will help identify design issues and refine the final interface.

Finally, we will focus on enhancing the performance and memory safety of our framework. This involves incorporating string pooling, representing strings with unique integers, and numbering basic blocks and variables with serial IDs. These optimizations will enable us to use more efficient data structures like integer bitsets and arrays instead of the comparatively costly hashsets and hashmaps. Additionally, we will employ profiling to identify bottlenecks and memory leaks to guide further performance optimizations and memory safety. We plan to consult existing compiler frameworks like GCC and LLVM to inform our decisions and design.

How will you empirically measure success?

Like the Rust library, we intend to implement the brili and bril2txt commands as seamless replacements for the current implementations. Success will be measured by ensuring these commands pass checks with valgrind and demonstrate comparable runtime performance to existing implementations. Ideally, we aim to optimize these two programs within our framework to outperform all existing implementations.

To gauge the usability of our framework, we will reimplement several analyses and optimizations from our class. At a minimum, we will implement dominator and liveness analysis, conversion to and from SSA, and various loop optimizations. If time permits, we may tackle more complex optimizations like partial-redundancy elimination. We will evaluate the ease of use of our framework and update it if needed.

Team members: @ryanwmao

sampsyo commented 9 months ago

Awesome; this sounds really great!

This is definitely a bit of a "stretch goal," but one thing I would love to see as an outcome here is specific performance measurements for particular optimizations. That is, if you end up packing stuff into flat arrays, is it possible to implement a version using a hashmap instead, purely as a performance baseline? I think it could be super informative—for posterity—to get a few measurements characterizing the benefits of these optimizations.

Of course, all that is kinda beside the point of the main thrust here, which is why I say it's an optional thing. But could be fun nonetheless!