mun-lang / mun

Source code for the Mun language and runtime.
https://mun-lang.org
Other
1.83k stars 72 forks source link

Garbage Collection #206

Open Wodann opened 4 years ago

Wodann commented 4 years ago

Currently, the Mun runtime defaults to a mark & sweep garbage collector (henceforth GC). This a relatively simple implementation of a GC, that serves as a good proof of concept. However, going foreword we need to mature our GC story.

Goals

  1. [ ] Benchmarking - The first step should be to create a benchmark suite that is used to avoid regression and guides high-level optimisations. image [discussion A] What metrics/use cases are important to benchmark?
  2. [ ] Constantly low overhead The Mun GC will have to run at interactive frame rate (30-60 Hz). As such it only has a minimal time budget to spend on GC - every frame.
  3. [ ] Rewrite the stack As Mun hot reloads state, data structures have to be mapped in memory. The current Mun GC is limited to rewriting all GC-allocated memory, but in the future we also want it to rewrite stack memory.
  4. [ ] Concurrency In the future, we want to implement some form of concurrency within the Mun runtime. As such, GC design discussions should sync up with Mun concurrency design discussions.
Wodann commented 4 years ago

Minutes from a Discord discussion on GC:

baszalmstra commented 4 years ago

Note that immix-rust is not a reference-counting GC

atennapel commented 4 years ago

Also see Counting Immutable Beans for an implementation of reference counting in the Lean theorem prover. They do some clever things like reusing data structures if the reference count is 1. Though it's really focused on purely functional languages.

pitaj commented 4 years ago

I'd just like to point out the option of a colored-pointer GC like ZGC. While ZGC only works on 64-bit architectures (because it needs the extra bits for colored pointers), it has very short pause times that do not scale directly with heap size.

yorickpeterse commented 4 years ago

If I remember correctly, the immix-rust repository does not implement everything of Immix. Specifically, it does not support compacting the heap if my memory serves me right. For Inko I ended up implementing Immix, including support for compacting the heap. If it's of any use, you can find the code for it here:

I also wrote some documentation on this that may help understand the code a bit better: https://inko-lang.org/manual/virtual-machine/memory-management/

I'm happy to answer any questions about Immix as well :smiley:

moon-chilled commented 4 years ago

@pitaj you can use coloured-pointer gcs on 32-bit platforms if you use the low-order bits for colouring. For example, zgc currently uses only 4 bits for colouring, so if you align every allocation to 16 bytes, the low 4 bits are irrelevant and can be used by gc.

playXE commented 4 years ago

I'll try to make GC API more generic in: Issue 207 and soon create PR with initial changes to API.

steveblackburn commented 4 years ago

Lots of interesting thoughts above.

We're doing a full re-write of MMTk in Rust. Although our implementation is not yet public (I hope to have it for you soon), you can take a browse at the Java code base on which the Rust re-write is based. They key difference is that the re-write targets multiple VMs and multiple languages.

AlexCouch commented 4 years ago

I haven't seen Conservative GC be mentioned yet which is what Crystal uses for GC so I'm gonna leave a link here to both: Conservative GC Crystal-lang

fleabitdev commented 4 years ago

Hi folks,

I was pointed to this issue by @erlend-sh, after announcing GameLisp yesterday.

You might be interested by GameLisp's garbage collector. It's not an exact fit for Mun (because Mun seems to be a much more general-purpose language than GameLisp), but you might be able to borrow some ideas. There's a guide-level explanation here, some basic performance figures here, a detailed technical overview here, and the source code itself is here.

Technical summary:

One of the most surprising things I learned while implementing the GC is that, for games, the raw throughput of the garbage collector just doesn't matter very much. GameLisp's algorithm has a lot of room for optimization (especially if I were willing to make heavier use of unsafe code), but even now, the ratio of script-execution time to GC time is something like 20:1. Because most games are unlikely to spend more than a few milliseconds each frame executing scripts, the cost of the GC is negligible.

Let me know if you have any questions!

erlend-sh commented 2 years ago

Might Dart’s GC be a useful reference for Mun? The two languages seem to have pretty similar design goals. Dart uses a generational GC, like GameLisp. @jmeggitt seems to have been tinkering in this space recently.

https://medium.com/flutter/flutter-dont-fear-the-garbage-collector-d69b3ff1ca30

Techcable commented 2 years ago

Consider looking into the Memory Pool System.

It implements a generational GC with bump-pointer allocation.

Language integration is much easier than with MMTK (having tried both).

Although the library is not very widely known, it is backed by a professional contracting-firm. It is used in production in multiple language implementations (main one is OpenDylan).

This project might be worth taking a look at. In addition to bump-pointer allocation and generational collection, it also supports concurrent collection! It is likely to be faster than whatever you can write yourself (unless you put like a ton of effort in it).

It is not quite as fast level of MMTK/Java/Go, but can be surprisingly competitive 😄

If you do decide to write a collector yourself, "The Garbage Collection Handbook" is very useful (and exceptionally well written).

Downsides on MPS: Write barriers are done using OS-level page protection. While this makes compilation easier, this means that some of your callback functions might be called from inside signal handlers.

Also collection is (mostly) concurrent, so your trace functions might be called from other threads (and collections can happen any time, not just at allocations).

Because of these two things, Integration is difficult to do unless its in C. The library uses a lot of macro magic. Signal handlers can be tricky to get right in higher-level languages (even Rust) while the macros take care of it in C.

Inline allocation is an exception, and is easy to do even in hand-written/JIT compiled assembly.

erlend-sh commented 2 years ago

Language integration is much easier than with MMTK (having tried both).

Were you trying with the Rust version of MMTk? It looks like the rewrite @steveblackburn was talking about has since gone public, and is being actively developed by @qinsoon & co. https://github.com/mmtk/mmtk-core

erlend-sh commented 2 years ago

Luau was released after this discussion started. Since it’s specifically made for games (official language for Roblox modding) it surely shares a lot of design goals with Mun.

https://luau-lang.org/performance#improved-garbage-collector-pacing

Improved garbage collector pacing

Luau uses an incremental garbage collector which does a little bit of work every so often, and at no point does it stop the world to traverse the entire heap. The runtime will make sure that the collector runs interspersed with the program execution as the program allocates additional memory, which is known as “garbage collection assists”, and can also run in response to explicit garbage collection invocation via lua_gc. In interactive environments such as video game engines it’s possible, and even desirable, to request garbage collection every frame to make sure assists are minimized, since that allows scheduling the garbage collection to run concurrently with other engine processing that doesn’t involve script execution.

Inspired by excellent work by Austin Clements on Go’s garbage collector pacer, we’ve implemented a pacing algorithm that uses a proportional–integral–derivative controller to estimate internal garbage collector tunables to reach a target heap size, defined as a percentage of the live heap data (which is more intuitive and actionable than Lua 5.x “GC pause” setting). Luau runtime also estimates the allocation rate making it easy (given uniform allocation rates) to adjust the per-frame garbage collection requests to do most of the required GC work outside of script execution.

baszalmstra commented 2 years ago

Very interesting read!

steveblackburn commented 2 years ago

@Techcable I'm keen to hear of your MMTk experience. What language were you porting? Any thoughts on what you found difficult?

We have a lively community with a lot of ports underway right now, and some pretty interesting work with new algorithms (outperforming G1 and Shenandoah on OpenJDK now).

Techcable commented 2 years ago

@Techcable I'm keen to hear of your MMTk experience. What language were you porting? Any thoughts on what you found difficult?

I tried integrating with a Python implementation several months ago. Mostly gave up because writing one by hand seemed to be simpler (definitely not faster).

It looks like the implementation and documentation has improved significantly since than. I will have to try again :)

erlend-sh commented 2 years ago

Some folks here might be interested in participating in this: https://www.reddit.com/r/rust/comments/vpq6yn/user_study_interest_in_a_rustlike/

by @typesanitizer (twitter.com/typesanitizer)

typesanitizer commented 2 years ago

@erlend-sh I called off the study because I didn't get much initial interest (only a handful of survey responses) to be able to perform a meaningful study. 😐

erlend-sh commented 2 years ago

Ah, too bad. In any case, I think Mun is highly relevant to what you're thinking about :)

(Admins, feel free to delete these last comments.)

erlend-sh commented 1 year ago

https://github.com/kyren/gc-arena by @kyren might be a good fit.