Platform fails with OOM error when loading second application

azzazzel commented 1 year ago

I have the platform running locally and uploaded two applications (welcome and count from the examples). Loading each application alone works fine. But when one is loaded, trying to load a second one (regardless of the order), the platform fails with the following:

❯ xec -L lib/ lib/kernel.xtc
Enter password:

Starting the AccountManager...
Starting the HostManager...
Starting the platform UI controller...
Started the XtcPlatform at http://xtc-platform.xqiz.it:8080
Info: Created a host module 'platformDB_jsondb.xqiz.it' for 'platformDB.xqiz.it'

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "main"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "server-timer"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "HTTP-Dispatcher"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "server-timer"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "ecstasy:LocalClock"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "HTTP-Dispatcher"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "server-timer"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "server-timer"
Exception in thread "server-timer" java.lang.OutOfMemoryError: Java heap space
Exception in thread "HTTP-Dispatcher" java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space

I understand this may be because I need to give the platform more resources (still trying to figure out how), but IMHO, a lack of resources to load an application should not bring the entire platform down.

lagergren commented 1 year ago

It could be the case that the heap setting for the xec launcher needs more JVM memory. It probably has a default Xmx value somewhere, @gene?

azzazzel commented 1 year ago

I worked around it by changing the setting in xdk/src/main/resources/xdk/bin/xec.cfg and running gradlew dist-local.

I see Gene updated the README with the information on where to make the change without the need to rebuild. But that'll only last till the next update, I guess.

Anyway, the issue is not about increasing the amount of memory but about gracefully handling such situations. It should not be possible for a user to bring down the entire platform by simply uploading and running resource-consuming module(s).

lagergren commented 1 year ago

I’m not even sure why we have an -Xmx limit at all in our JVMargs at the moment. We probably just remove it as everything we do is Java 17 or above, and it’s more than competent at max heap size management. I think the unset mix used to default to 32 GB back in Java 8 land to be able to use compressed references, if the machine memory was <= 32 GB. On my 64 GB aarch64 mac M1, I get 130.8 MB for some reason, likely because it detects I have more native memory than its worth in GC performance. The heuristics aren’t very good in places, i.e. most of them don’t change depending on your current machine load, running other stuff and other JVMs, but they give a hint what the default will be set to on your machine, and more and more often, I run into stuff where there are 100 expensive 32 core virtual machines running micro services in separate JVMs with ancient cargo cutting flags spread widely across the entire configuration. For example at a previous job, these said machines used Parallel top the world GC on Java 8, but even worse forced the number of code gen threads to 2, as well as the number of GC threads. As they were running a primitive and ancient GC, while they kept adding duplicate Strings on the heap, to represent finite value sets that required anything from 1-10 bits (i.e. it’s fine with an immutable identical enum reference, instead of putting the string “MARKET” and the string “US” in 40 GB of heap memory … Anyway - before I started dealing with that, it definitely bought us some time to just use the #%)#()T= out of the box heuristic settings for the system they were on.

Hence:

java -XX:+PrintFlagsInitial java -XX:+PrintFlagsInitial |grep -i heap

Seem pretty decent to me on my machine, without any special settings.

You can also check the final flags:

java -XX:+PrintFlagsFinal

which is what happens after the command line has been parsed, and you can compare them.

Typically, large standard deviations are no good. Unless you absolutely know what you are doing, it is very likely to be suboptimal for modern Java versions, on the machine you are running.

Sadly the print flags initial option exits the VM after printing, it’s not like e.g. -showversion, but print flags final dumps the flags and runs the JVM job.

IMHO we should just run with out of the box JVM args and see what happens. It’s not going to bloat up more than there is on the machine. The only threat is if you run several apps on the machine, one which allocates native memory without giving it back. Typically another JVM with the developer mindset “direct byte buffers are free, because they are never GCs”. D’uh… Which means that the initial -Xmx promise which has been committed, but not reserved, by mmap, to create a contiguous address space for the heap, things that it can commit all of it. Suddenly that committed mmap region for that extra object has been used by the byte buffer bandit, and BOOM - the OOM killer on Linux destroys your process without logs.

Suggestion : it’s amazingly great that you support configuring JVM arguments, please continue to do so, but I also suggest that we don’t have any heap limits and see what happens.

/M

On 11 Aug 2023, at 23:36, Milen Dyankov @.***> wrote:

I worked around it by changing the setting in xdk/src/main/resources/xdk/bin/xec.cfg https://github.com/xtclang/xvm/blob/master/xdk/src/main/resources/xdk/bin/xec.cfgand running gradlew dist-local.

I see Gene updated the README with the information on where to make the change without the need to rebuild. But that'll only last till the next update, I guess.

Anyway, the issue is not about increasing the amount of memory but about gracefully handling such situations. It should not be possible for a user to bring down the entire platform by simply uploading and running resource-consuming module(s).

— Reply to this email directly, view it on GitHub https://github.com/xtclang/platform/issues/1#issuecomment-1675425768, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIDMSJ3JX5A2S24Q3LGC6TXU2QUVANCNFSM6AAAAAA2Y3MLKA. You are receiving this because you commented.

ggleyzer commented 1 year ago

Following Marcus's suggestion, I removed the "-Xmx" configuration (and xec.cfg along with it). Milen is absolutely correct; we do need to be "graceful" and not allowing a single deployment take down the platform. However, this is an integral part of a much bigger project that will follow the new run-time implementation.

xtclang / platform

Platform fails with OOM error when loading second application #1