mmtk / mmtk-core

Memory Management ToolKit
https://www.mmtk.io
Other
379 stars 69 forks source link

Out-of-memory handling in NoGC #1177

Open wks opened 4 months ago

wks commented 4 months ago

When NoGC runs out of memory, it panics in NoGC::schedule_collection with an unreachable!() macro. A recent PR https://github.com/mmtk/mmtk-core/pull/1175 attempts to moves the panicking earlier into GCTrigger::poll.

However, we do have an out-of-memory handler Collection::out_of_memory. It allows the VM to handle OOM events in a VM-specific way, such as throwing OutOfMemoryError. Currently, when using NoGC, it will panic before reaching any call sites of Collection::out_of_memory.

When running Epsilon GC in OpenJDK 22, it throws OutOfMemoryError.

$ java -XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC -Xms40M -Xmx40M -jar dacapo-23.11-chopin.jar lusearch
[0.002s][warning][gc,init] Consider enabling -XX:+AlwaysPreTouch to avoid memory commit hiccups
Using scaled threading model. 32 processors detected, 32 threads used to drive the workload, in a possible range of [1,2048]
Terminating due to java.lang.OutOfMemoryError: Java heap space

When running NoGC in MMTk, it panics with "internal error: entered unreachable code: GC triggered in nogc".

$ MMTK_PLAN=NoGC ~/projects/mmtk-github/openjdk/build/linux-x86_64-normal-server-release/jdk/bin/java -XX:MetaspaceSize=500M -XX:+UseThirdPartyHeap -Xms40M -Xmx40M -jar dacapo-23.11-chopin.jar lusearch
Using scaled threading model. 32 processors detected, 32 threads used to drive the workload, in a possible range of [1,2048]
thread '<unnamed>' panicked at /home/wks/projects/mmtk-github/mmtk-core/src/plan/nogc/global.rs:74:9:
internal error: entered unreachable code: GC triggered in nogc
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
fatal runtime error: failed to initiate panic, error 5
fish: Job 1, 'MMTK_PLAN=NoGC ~/projects/mmtk-…' terminated by signal SIGABRT (Abort)

When running SemiSpace in MMTk with a small heap size, it throws OutOfMemoryError, too

$ MMTK_PLAN=SemiSpace ~/projects/mmtk-github/openjdk/build/linux-x86_64-normal-server-release/jdk/bin/java -XX:MetaspaceSize=500M -XX:+UseThirdPartyHeap -Xms10M -Xmx10M -jar dacapo-23.11-chopin.jar lusearch
Using scaled threading model. 32 processors detected, 32 threads used to drive the workload, in a possible range of [1,2048]
Version: lucene 9.7.0 (use -p to print nominal benchmark stats)
===== DaCapo 23.11-chopin lusearch starting =====
java.lang.reflect.InvocationTargetException
java.lang.reflect.InvocationTargetException
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at org.dacapo.harness.Lusearch.iterate(Lusearch.java:43)
        at org.dacapo.harness.Benchmark.run(Benchmark.java:253)
        at org.dacapo.harness.TestHarness.runBenchmark(TestHarness.java:225)
        at org.dacapo.harness.TestHarness.main(TestHarness.java:170)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at Harness.main(Unknown Source)
Caused by: java.lang.OutOfMemoryError: Java heap space
        at org.dacapo.harness.LatencyReporter.initialize(LatencyReporter.java:70)
        at org.dacapo.lusearch.Search.main(Search.java:141)
        ... 13 more

But we can't simply call Collection::out_of_memory in GCTrigger. We do have dedicated code paths that calls Collection::out_of_memory and they should be used instead of skipped.

Mock testing

Meanwhile, some of our mock tests, such as allocate_with_re_enable_collection, still depends on block_for_gc to detect if GC is triggered. When fixing this problem, we probably need to reserve a proper hook for the MockVM to detect that GC has been triggered.

k-sareen commented 4 months ago

This should be easy to solve. Just need to add a check here to see if the plan can even collect. If it can't then call Collection::out_of_memory like we do above with will_oom_on_acquire.