mmtk / mmtk-core

Memory Management ToolKit
https://www.mmtk.io
Other
348 stars 67 forks source link

Watch dog #1114

Open wks opened 2 months ago

wks commented 2 months ago

We have recently observed some bugs causing tests to hang while doing GC. For example

Given that a typical GC shouldn't take more than a few seconds, there should be some watch dog mechanism so that the process can panic and printing the stack trace of all threads.

Watch dog is also valuable for real-world applications, especially for mobile applications. If an application is unresponsive, the OS will try to restart it, or notify the user for further actions.

k-sareen commented 2 months ago

If a thread is waiting on a lock for more than X seconds/minutes in ART, it panics and dies. Perhaps we need something similar.

wks commented 1 month ago

https://github.com/mmtk/mmtk-jikesrvm/actions/runs/9124315467/job/25088273084?pr=172

In this test run, JikesRVM hung for 35 minutes without making progress while running lusearch with RFastAdaptiveMarkSweep. There is no indication if it hung during GC, but it is very likely.