wks commented 11 months ago

TL;DR: Some VMs (CRuby, ART, etc.) support forking, but fork() doesn't duplicate any threads other than the one that calls fork(). Currently, if a VM calls fork(), MMTk GC threads will not exist in the child process. We need to have the necessary mechanisms to support fork().

Requirement

CRuby

Ruby has the method Kernel#fork. It does what the fork() system call does for Ruby, i.e. duplicates the current process, but only the current Ruby thread, not other threads.

Shopify's use case involves forking the VM to handle different requests. The Ruby process performs a compacting GC before forking so that the heap is less fragmented for the children. This is not a problem because CRuby's own GC does GC in the same mutator thread. In other words, it doesn't have dedicated GC threads.

When using MMTk, after forking, the child process will not have any GC thread. If a mutator thread in the child process triggers a GC, it will block forever for the GC to finish. But GC will never happen because there is no GC thread.

Android ART

The "Zygote" process runs an ART VM, and forks into different application processes. This is intended for accelerating class loading.

We will face the same problem if the Zygote process forks.

What should happen when forking?

We first need to let GC threads come to a graceful stop. We can only fork() when no GC thread is running.

We also need to make sure all mutators are at safe point, and all contexts are flushed. After fork(), only one thread will remain, and that's likely a mutator thread. This means,

Other mutator threads must not be in a critical section w.r.t. GC. For example, it must be in the middle of allocating and intializing an object, and must not be in the middle of executing a write barrier.
Other mutators will need to flush their thread-local states. Their mod buffers need to be flushed. For the MiMalloc allocator, mutators need to give back blocks cached locally. Bump-pointer allocators can be discarded as long as they are not in the middle of allocation.

Right before fork(), all GC threads must stop. After fork(), we should restart GC threads. We can ignore the coordinator thread for now because we plan to remove it (we'll discuss that in https://github.com/mmtk/mmtk-core/issues/1053). The states of a GC worker is encapsulated in the GCWorker struct, so it should be easy to restart GC threads by reusing the GCWorker structs.

What needs to be done?

Everything will be easier if we remove the coordinator first. See https://github.com/mmtk/mmtk-core/issues/1053

We need to add an API to stop all GC threads for forking. It is basically the reverse of initialize_collection.

We need another API to restart GC threads. It should be similiar to initialize_collection, but it should reuse the existing GCWorker structs rather than creating new instances.

We need to further make sure that GC worker threads save all states in the GCWorker struct before exiting.

wks commented 11 months ago

This problem is responsible for some of the CI test timeouts for the Ruby binding. The hanging test case is TestAutoload#test_autoload_fork. It forks, and will usually pass. But if GC is, unfortunately, triggered in a child process, the child will wait forever. And the GC test will hang until being killed after 5 hours.

qinsoon commented 11 months ago

An alternative design is to allow re-entrance for initialize_collection and allow MMTk to check with the binding if a GC thread exists. For initialize_collection, if MMTk does not find GC threads, they will spawn new threads. So after fork, the new process would just call initialize_collection again.

k-sareen commented 11 months ago

An alternative design is to allow re-entrance for initialize_collection and allow MMTk to check with the binding if a GC thread exists. For initialize_collection, if MMTk does not find GC threads, they will spawn new threads. So after fork, the new process would just call initialize_collection again.

Yes, that's exactly what I was going to suggest. Making it reentrant would be easier and/or exposing thread creation explicitly in the API so that a runtime can call it after calling fork.

mmtk / mmtk-core

Supporting the `fork()` system call #1054

Requirement

CRuby

Android ART

What should happen when forking?

What needs to be done?