softwareCobbler / luceedebug

line debugger for lucee
GNU Lesser General Public License v2.1
44 stars 15 forks source link

java.lang.OutOfMemoryError: Java heap space after loading numerous pages #34

Closed chapmandu closed 1 year ago

chapmandu commented 1 year ago

After ~sitting idle for an hour or so~ loading numerous pages (debugger not started in vscode), the stdout shows as below

[INFO ] runwar.context: [luceedebug] Loaded luceedebug.coreinject.DebugManager@2e53b094 with ClassLoader 'lucee.core [49]'
[INFO ] runwar.context: [luceedebug] attempting jdwp self connect to jdwp on localhost:9999...
[INFO ] runwar.context: [luceedebug] jdwp self connect OK
[INFO ] runwar.server: ******************************************************************************
[INFO ] runwar.server: *** starting 'stop' listener thread - Host: 0.0.0.0 - Socket: 43951
[INFO ] runwar.server: ******************************************************************************
[INFO ] runwar.server: Server is up - http-port:80 stop-port:43951 PID:1 version 4.8.3

.... making 10-20 requests to various application pages ....

[ERROR] runwar.context: Exception in thread "JDI Target VM Interface" 
[ERROR] runwar.context: java.lang.OutOfMemoryError: Java heap space
[ERROR] runwar.context: java.lang.OutOfMemoryError: Java heap space
[ERROR] runwar.context:     at org.apache.commons.pool2.impl.GenericObjectPool.evict(GenericObjectPool.java:718)
[ERROR] runwar.context:     at org.apache.commons.pool2.impl.BaseGenericObjectPool$Evictor.run(BaseGenericObjectPool.java:1138)
[ERROR] runwar.context:     at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
[ERROR] runwar.context:     at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
[ERROR] runwar.context:     at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
[ERROR] runwar.context:     at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[ERROR] runwar.context:     at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[ERROR] runwar.context:     at java.base/java.lang.Thread.run(Thread.java:829)

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "XNIO-1 I/O-5"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Thread-6"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Thread-489"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "java-sdk-http-connection-reaper"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "XNIO-1 Accept"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "commons-pool-evictor-thread"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "XNIO-1 I/O-6"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Thread-395"
^[[1;2D
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "luceedebug-worker"

It looks like the heap is not being garbage collected..

image

I'm running 2 lucee containers in Docker behind an nginx proxy. Ports 10000 and 10001 Linux Mint Lucee 5.3.10.97 Commandbox: commandbox:jdk11-alpine-3.7.0

JVM Settings:

"jvm": {
    "minHeapSize": 256,
    "heapSize": 1024,
    "args": [
        "-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=localhost:9999",
        "-javaagent:/app/lib/luceedebug.jar=jdwpHost=localhost,jdwpPort=9999,debugHost=0.0.0.0,debugPort=10000,jarPath=/app/lib/luceedebug.jar"
    ]
}
softwareCobbler commented 1 year ago

Does that graph say that over the course of 3 minutes, it went from 25% to 100% and subsequently died? If your memory configuration was already carefully tuned for your workload, it's possible that ephemeral objects tracking debug info push you over the edge of your existing limits, and that they would be collected, if they could be allocated.

I concede that there may be a leak somewhere, though that's tough to debug from here. For my workloads, I see it collect after running a suite of hundreds of heavy and slow integration tests for $DAYJOB's app, which I thought was a pretty good stress test for "cool it doesn't leak".

Can you double your memory limit, and see if that allows the allocations to succeed but that they are eventually collected?

softwareCobbler commented 1 year ago

I guess something has changed since I last checked for leaks, and recently I've been tearing down my servers after test runs to clear caches and application state and etc, so I haven't given it time to accumulate uncollectable garbage. I've had a server running all morning and can confirm it's slowly eating all memory. Thanks for bringing this to my attention, will investigate.

chapmandu commented 1 year ago

Yes, that graph shows about 3 minutes. It's a newly started development environment container and I've simply refreshed my application's main page about 20 times. I'll try running my test suite though I suspect it won't last very long. Admittedly there is not a lot of memory available but it's has been a reliable environment for a number of years.

softwareCobbler commented 1 year ago

I claim this is fixed; trying to delete from a map with a wrong key (gosh Java's type system is weak sometimes, why does that even compile? Map<K,V>.remove(<some type that isn't K>)). So this map was growing indefinitely; it gets cleaned up now.

chapmandu commented 1 year ago

I tried the latest code, unfortunately I'm still seeing:

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "JDI Target VM Interface"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Thread-6"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Thread-83"
*** java.lang.instrument ASSERTION FAILED ***: "!errorOutstanding" with message can't create name string at src/java.instrument/share/native/libinstrument/JPLISAgent.c line: 827
softwareCobbler commented 1 year ago

Ah darn. I'll keep looking.

If you have the time and willingness, can you run a flight recording with full memory leak detection and see if it returns an obvious culprit? The fix from earlier was extremely obvious with that tool on my end (something like hundreds of thousands of uncollected DebugFrame objects were reachable), maybe you'll see something equally important on your end.

The following java startup args work for me to get JDK mission control (JMC) running flight recordings:

-Dcom.sun.management.jmxremote.port=<some-port-number>
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false

And then there's an option in JMC to take a recording with memory leak detection at level "Object types + Allocation Stack Traces + Path to GC Root".

chapmandu commented 1 year ago

Hold that thought.. I just realised I may not have loaded the new jar! Ill report back in a couple of hours. Sorry.

chapmandu commented 1 year ago

The memory leak looks to have been resolved.. When I reloaded my docker environment, I neglected to use the build flag which copies the jar into the application. Sorry for the false alarm.