mmtk / mmtk-core

Memory Management ToolKit
https://www.mmtk.io
Other
379 stars 69 forks source link

Support MMTk instances #100

Open qinsoon opened 4 years ago

qinsoon commented 4 years ago

Though we have done some refactoring to allow multiple instances, we currently have no use case of multiple MMTk instances, and it is unclear that we are missing to properly support multiple instances. Porting to V8 could be a good chance to figure this out.

wks commented 3 years ago

TL;DR: The VM controls the lifetime of MMTK and Mutator, not mmtk-core. For safety, mmtk-core need to do dynamic checking like RefCell does. However, we need to find a container type C<T> that satisfies the properties discussed below.

I have been thinking about what is the proper way to express the lifetimes of MMTK, Plan and Mutator instances. My current conclusion is that their lifetimes are dynamic, and cannot be statically checked by Rust. More precisely, their lifetimes are controlled by the VM, not by mmtk-core itself.

The VM controls the lifetimes

Currently mmtk-core requires the MMTK instance to be static. By doing this, there is no correctness issues now, but when introducing multiple MMTK instances, the lifetime of MMTK, Plan and Mutator become important.

In theory, lifetime(Plan) == lifetime(MMTK), and ∀ Mutator. lifetime(Mutator) < lifetime(Plan) (a < b means lifetime a is contained within lifetime b). Violating this (destroying the MMTK instance while a mutator is still running) results in error. When lifetime(MMTK) == 'static, the above predicates always hold.

But MMTK is, by nature, controlled by the VM via the API. So

the lifetime of MMTK and Plan:

and the lifetime of Mutator:

So if a buggy VM insists on calling the API in the following sequence:

MMTk* mmtk = create_mmtk(&config);
MMTk_Mutator* mutator = bind_mutator(tls);
destroy_mmtk(mmtk); // ERROR: mutator is still alive
destroy_mutator(mutator); // too late

then the third line destroy_mmtk(mmtk) will result in an error, but the Rust mmtk-core has no static way to prevent C code from making such a mistake. An unsafe API can just let the third line have undefined behaviour, while a "safe" API may return an error code, or just panic (at least it fails fast).

This means, Rust's static life-time checker cannot help. We could just use unsafe pointers. However, it is possible to dynamically check the ownership, and panic when a lifetime error are detected.

We need a C<T> that dynamically check for lifetime

RefCell is an example of dynamic borrow checking. It internally maintains a borrow status which includes a borrow counter. The user may use the borrow or the borrow_mut methods to borrow the content immutably or mutably. But if any invocation of borrow of borrow_mut violated the rule that "there may be either many immutable borrows or exactly one mutable borrow, but not both", it panics.

Similarly, we could use a kind of container (let's call it C<T>, like Box<T>, Arc<T> or RefCell<T>) that ensures:

  1. C<T> has a unique owner. The owner determines when the object is freed.
  2. It may be borrowed dynamically by invoking its borrow() method. The borrow() method returns a borrow handle (let's call it CRef<T>, just like RefCell<T>::borrow() returns a Ref<T>) that can be used to access the content.
  3. The borrow handle C<T> can be sent to other threads (impl Send), and the content of C<T> can be accessed by multiple threads (impl Sync).
  4. C<T> maintains a counter that knows how many CRef<T> exists. If C<T> itself is dropped while any CRef<T> still exists, it reports error.

In this project, the VM creates an MMTK instance which is contained in a C<MMTK>. A Mutator can hold a CRef<MMTK> to access MMTK safely. If the client attempts to drop the C<MMTK> when any mutator is still alive, it reports an error.

However, I haven't found a suitable C.

Box<T> satisfies (1) that it is uniquely owned. But it only satisfies (1). It can be borrowed, but only statically in a scoped fashion.

Arc<T> does not satisfy (1) because there is no unique owner. All Arc<T> instances are equal. If any Arc<T> is still alive, the content is alive. But we want to immediately kill the MMTK instance when the VM calls destroy_mmtk, not when the last mutator that holds an Arc<MMTK> exits. Using Weak<T> with Arc<T> overkills.

The standard std::cell::RefCell is not thread-safe. There is a atomic_refcell crate that provides the thread-safe AtomicRefCell. It satisfies (1) and (2), but not (3) or (4). Both RefCell and AtomicRefCell are intended for internal mutability, not lifetime management. It needs to be contained in another container that can be shared between threads (such as Arc<AtomicRefCell<T>>). Like RefCell, the AtomicRef<T> returned by AtomicRefCell<T>::borrow() can only be used in its scope, i.e. we cannot create an AtomicRef<T> and send it to another thread. Like RefCell, AtomicRefCell<T> does not check if any AtomicRef<T> still exists when AtomicRefCell<T> is dropped.

It remains an open question what C we should use.

qinsoon commented 3 years ago

For your example code, Rust can static check it: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=b0b44fab68cb343b8978f5eb2508b92f

wks commented 3 years ago

@qinsoon Yes. This is helpful if the VM itself is written in Rust. But what if the main function in your snippet is written in C++? Of course we can use unsafe in the VM binding, and cast the MMTK* pointer to a reference with an assumed lifetime. See this modified code:

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=634008024197ac999502c8c360387114

The code above shows a VM calling into MMTK via a C-style binding. This time, if the VM calls the API functions in the wrong order, it still compiles, but will have runtime error. The code above prints a garbage value, and Valgrind can catch the use-after-free bug.

Using unsafe in VM bindings is not wrong. If the VM calls the function in the wrong order, it is the VM's fault after all. But I still think some dynamic checking can be helpful. Instead of invoking undefined behaviour (hopefully crash, but may stay quiet for a long time before it suddenly starts behaving mysteriously) when the c_destroy_mmtk function in the above example is called, it should give an error message like:

thread 'main' panicked at 'MMTK is destoryed when there are still 5 Mutators pointing to it', path/to/source_code.rs:30:5
stack backtrace:
   ...
qinsoon commented 11 months ago

Our current VMBinding methods are all static methods, such as ActivePlan::number_of_mutators(). I think we need to turn them into instance methods to support multiple instances, such as ActivePlan::number_of_mutators(&self), and add a field like vm: VM to MMTK<VM>.

qinsoon commented 10 months ago

Set the priority to low. We should raise the priority if we work on any VM that needs this.