sampsyo / cs6120

advanced compilers
https://www.cs.cornell.edu/courses/cs6120/2023fa/
MIT License
752 stars 158 forks source link

Project Proposal: Xic Objects and Automatic Dynamic Garbage Collection #306

Closed michaelmaitland closed 2 years ago

michaelmaitland commented 2 years ago

Project Proposal: Xic Objects and Automatic Dynamic Garbage Collection

What will I do?

I will implement objects in my Xic compiler from CS 4210 and implement automatic automatic dynamic garbage collection for the language. If there is enough time, the scope can be extended to implement object based optimizations.

How will I do it?

Implementing Objects

There are a few relevant parts to this side of the project. First, I must modify the syntax of the language to support object-oriented-ness. I will have to decide on a syntax for this new language construct. Next, I will have to implement type checking for objects. This obviously includes defining a set of inference rules to use in type checking.

I will reach out to Professer Myers to see if he has inference rules for syntax and semantics, because I'm pretty sure implementing object orientedness is a project that has been assigned in the past. It would be much easier to use syntax and type rules that are already defined since the goal of this project is how to implement objects based on how they behave, not coming up with a way how they should behave.

Once I have added support for objects in the AST, I will have to translate them to IR. This will be a significant part of my project. How will I represent objects in IR form? This will include stuff like object layout and method dispatch.

Implementing Garbage Collection

I will implement automatic dynamic garbage collection. I propose to implement a simple tracing collector that can serve as a baseline collector. Then I will extend on this collector by implementing fancier collectors. I plan to implement a generation garbage collector. This part will require me to dive deeper into the literature and experiment with different implementations of generational garbage collectors.

Implementing Object Oriented Optimizations

I expect that the scope defined above will keep me busy for the duration of the semester. However, if I do happen to finish early, I will explore one (or a few) object oriented related optimizations.

How will I evaluate success?

The main goal of this project is to evaluate the success of garbage collection. Implementing objects is a prerequisite to this evaluation. I will evaluate the performance of the generational garbage collector against the baseline tracing collector that I will also implement.

In evaluating, I would measure multiple things such as the total time it takes for the program to run, the amount of time it took for just the collector to run, and the impact of the garbage collector under different loads.

Evaluating the total time it takes for the program to run can be measured by measuring the total time it took for the program to run with the respective collector enabled. Evaluating the time it took for just the collector to run could be calculated by comparing the run time with collectors enabled versus the time it takes for no collection at all to happen. To measure the impact of the collector under different loads, I can come up with programs that use large amounts of memory in specific ways.

Group

I do not have a partner at the moment, but if someone wants to join me in what I plan to work on, I would love to form a group!

EDIT: @orkosinha and I are partners for this project

sampsyo commented 2 years ago

Sounds great. I think some empirical results comparing GC algorithms could be very interesting.

One big question, though: where will you get your benchmarks? Do you have a set of nontrivial/realistic Xi-with-objects programs to try compiling & running?

michaelmaitland commented 2 years ago

Update: I grabbed the inference rules from the CS 4120 2018 site.

michaelmaitland commented 2 years ago

I will have to create benchmarks myself.

sampsyo commented 2 years ago

Alright; makes sense! But FWIW I would see this as a chunk of the project in itself: writing a substantial set of realistic, non-toy benchmarks in a new language is not exactly a cinch.

michaelmaitland commented 2 years ago

Another thing I just realized -- Xic gets compiled to x86. Isn't the GC I want to do part of the runtime of the language? Where does a GC fit into all of this?

sampsyo commented 2 years ago

The rough idea would be that you would write your runtime library in some other compiled language (C++ or Rust, for instance) and then have your compiler generate calls into it. Like, you could insert a call gc() before every CFG backedge to make sure that the GC runs every so often. That gc function would be implemented in Rust or whatever and somehow receive access to the root set, from which you would start your tracing.

michaelmaitland commented 2 years ago

Finding a class member who would be interested in working on this with me would be super helpful in addressing scope worries. If I can't find anyone to join me, perhaps I will settle for implementing a standard tracing collector and not a fancy generational collector. This will allow me to dedicate more time to evaluating that the collector is correct and I would not focus so much on performance, or I could consider its performance to the same program that did manual freeing of data.

michaelmaitland commented 2 years ago

Added @orkosinha as team member on this project. Now I think we should have enough manpower to take on the generational garbage collector. In any case, we can discuss further and leave the expected scope as implementing objects, implementing a tracing collector, building a benchmark suite, and see where we are with time.

charles-rs commented 2 years ago

snooping on your project even tho i'm not involved: in the past, Xi had an OO extension, and was linkable with the Qt gui library. There are some nontrivial Xi++ programs here in the released files for pa7, and if you ask Prof. Myers I believe there are some Xi++ test cases around somewhere

michaelmaitland commented 2 years ago

Adding two milestones:

  1. Lexing OO Features
  2. Parsing OO Features

I have thorough writeups inside each PR of the changes made. Also test coverage for the OO features is very good for both of these changes which I was super happy about. I didn't write much about the tests in the writeup, but all the test cases can be checked in the diffs.

@orkosinha will review these changes and we will hopefully get them merged into the oo branch and then move onto IR Conversion which is really conversion to HIR and another pass to lower to LIR. We plan that LIR will remain the same and that we will only change the HIR which means we don't need to worry about converting from LIR to AbstractAssem. We will have a discussion this week to determine what changes to HIR look like.

We've also taken a look at the current runtime that Xic uses. It is currently using the gc library. We plan on creating our own fork of this runtime and rip out gc and replace it with our own garbage collector. As discussed in the original proposal, we will be writing a simple tracing (or perhaps RC) collector in this runtime, and once that all works, we will follow up with our fancy collector.

sampsyo commented 2 years ago

Nice; looking good so far! Hopefully the lexing/parsing are the straightforward part and the semantics will be the "interesting" part…

As far as GC, one thing you could consider is getting everything working with the Boehm collector first. Test the heck out of it at that point. Then, once you're confident it's working, then you can implement your own collector. A big advantage of this route would be that you could do benchmarking to compare the performance and say whether your new implementation is or is not faster than using an off-the-shelf conservative collector.

michaelmaitland commented 2 years ago

A nice juicy update inbound!

The PR that I've been working on since last update is here. I spent a lot of time outlining all changes made and exactly whats going on in the PR in the PR description so if you want to see what changes this introduces please see the PR.

Whats up next? Well there are a few things in that PR that are currently not tested. @orkosinha and I have a plan to meet this weekend to finish testing for this PR. We are going to merge all open PRs into the oo branch. Then we are going to open a oo-test branch which focuses extending the IRSimulator (see below) and on testing the heck out of all the changes we've made. With the merging of oo-test, we hope this will complete all of the OO changes! Those changes took a lot of hard work but I'm super happy with how they came out. I think I did a really good job of generating good IR, primarily with regard to our fancy dispatch vectors that work O(1) field and function lookup!! Plus I'm really happy with how much testing that has already been added.

After we wrap up oo-test it's time to work on GC. I think that it makes most sense to build GC into the IRSimulator instead of the runtime. This saves us from having to work on code generation for IRDataArray which would generate static data in x86. By working on the IRSimulator we get to skip this and also work in Java instead of C. I think this will make it easier both in terms of writing code and also adding tests because we can build off the existing JUnit tests that us the IRSimulator.

sampsyo commented 2 years ago

Wow, super rad!! Seems like doing something very minimal for the GC is in order, just to prove that it's possible. A simple mark/sweep may be the easiest, depending on the architecture of your interpreter…

michaelmaitland commented 2 years ago

I just made final changes to the codebase! I've got dispatch vectors all sorted out and tested. I also spoke with Professor Myers today about them. Turns out that the current IRData node does need some updating if we wan't to use it in the way I wanted to and he agreed that this approach is what he had intended. I've added notes of this to my writeup. I've also sent @orkosinha over this writeup of things that may not be super clear in my PR descriptions. That writeup and the PR descriptions should be very helpful for the experience report.

@orkosinha is working on adding more integration tests which serves as our evaluation, and putting together the experience report which will be comprised of PR summaries, my writeup, and his presentation of the evaluation.