pterodactyl / panel

Pterodactyl® is a free, open-source game server management panel built with PHP, React, and Go. Designed with security in mind, Pterodactyl runs all game servers in isolated Docker containers while exposing a beautiful and intuitive UI to end users.
https://pterodactyl.io
Other
6.74k stars 1.72k forks source link

Possible Memory Leak #4635

Closed petulikan1 closed 6 months ago

petulikan1 commented 1 year ago

Current Behavior

As of writing this, we're currently dealing with a RAM issue. Once we start a Spigot server without any plugins just a basic server and we connect to the server more than 100 times, the RAM increases over time. Hovewer the RAM shown in the panel doesn't decrease at all. Once it's allocated it stays that way.

We've made several tests with heapdumps checking if some of the plugins actually doesn't have a memory leak but didn't find any.

image

Here's an image of a running server for more than 20 hours with 4 GB's of RAM, as this server doesn't creates a new threads it's not crashing but on a different server where threads are made, it is.

Expected Behavior

Panel should be able to decrease the RAM of the running container to prevent from unexpected crash. (OOM Killer disabled) Once the ram get's increased it doesn't decrease and it's a pain for the server itself when there's over 40 plugins over 100 players and unexpectedly crashing once it reaches the container limit because some plugins require to create a new thread and when there's no available memory for the thread itself (Native memory).

Steps to Reproduce

Make a server with a Paper jar, allocate 4 GB's of RAM and connect to it until it reaches 4 GB's. Leave it for like an hour and then you'll see the same RAM just as you left it.

Panel Version

1.11.1

Wings Version

1.11.0

Games and/or Eggs Affected

Minecraft (Paper)

Docker Image

ghcr.io/pterodactyl/yolks:java_17

Error Logs

No response

Is there an existing issue for this?

parkervcp commented 1 year ago

Have you changed the startup command for this server?

petulikan1 commented 1 year ago

For the testing server (where I tested the RAM just by joining it) no I didn't. Left it just as the panel made it.

java -Xms128M -Xmx4096M -jar server.jar

Mutex21 commented 1 year ago

swapaccount=1 cgroup_enable=memory setup -> /etc/default/grub GRUB_CMDLINE_LINUX_DEFAULT & GRUB_CMDLINE_LINUX in the grub line those settings and reboot your dedi server. after that, you need to be sure the allocated swap is 0 in your game server configuration. use the default kernel or compiled by you, not a custom one (liquorix, xanmod) - for xanmod, maybe the LTS version can save you.

P.S: Update the docker to the latest version, I had a lot of problems with wings>1.5.3 version with older docker versions.

MrBretze commented 1 year ago

I have exact the same problem, and i have added the "swapaccount=1 cgroup_enable=memory setup" to the grub config...

petulikan1 commented 1 year ago

Did that help you resolve this problem?

Mutex21 commented 1 year ago

grub: swapaccount=1 cgroup_enable=memory docker info and update the docker to the latest version and Allocated Swap from pterodactyl panel (game server) should be 0.

p.s: you need to restart the entire dedi server

petulikan1 commented 1 year ago

Alright thanks, I'll try this and then respond if that worked or not.

MrBretze commented 1 year ago

I have made this change (swapaccount and cgroup_enable has already in this mod in the grub config). I have rebooted my dedi server and my server doesn't no decrease ram

My dedicated server is a Ryzen 5 3600 and 16Gb of RAM and a RAID 1 of 1TB HDD

if you want another information, just tell me

OS:

Ubuntu 22.04.1 LTS

Docker info

https://pastebin.com/574zu870

Grub Info

https://pastebin.com/ka4nuanG

Panel

1.11.2

Wings

1.11.0

Docker Image

ghcr.io/pterodactyl/yolks:java_17

petulikan1 commented 1 year ago

I have made this change (swapaccount and cgroup_enable has already in this mod in the grub config). I have rebooted my dedi server and my server doesn't no decrease ram

My dedicated server is a Ryzen 5 3600 and 16Gb of RAM and a RAID 1 of 1TB HDD

if you want another information, just tell me

OS:

Ubuntu 22.04.1 LTS

Docker info

https://pastebin.com/574zu870

Grub Info

https://pastebin.com/ka4nuanG

Panel

1.11.2

Wings

1.11.0

Docker Image

ghcr.io/pterodactyl/yolks:java_17

So it was just slowly increasing and at no point decreasing right?

petulikan1 commented 1 year ago

I tried this as well and RAM wasn't decreasing so I'm assuming it's a real problem.

MrBretze commented 1 year ago

So it was just slowly increasing and at no point decreasing right?

Yes

I tried this as well and RAM wasn't decreasing so I'm assuming it's a real problem.

Yes or configuration problem, but I don't know what is the problem...

MrBretze commented 1 year ago

So after some testing, I found a "Java issue" for this one, java NEVER clears the G1 old garbage collector. It's supposed to do it automatically, but I don't know why it doesn't do it automatically with docker.

I tried to add Java Arguments-XX:+UnlockExperimentalVMOptions and XX:+UseContainerSupport but it doesn't help/change the problem

With the spark plugin, if I execute the command spark heapsummary It's forced to clear the G1 old garbage collector and the memory used by the server decreases.

I have forced Java to use the parallel collector (instead of the G1 one) and although I have no memory issues, i get lag spikes so this seems not to be a viable solution.

For now I've added the Java arguments -XX:MinRAMPercentage=25.0 -XX:MaxRAMPercentage=50.0 and it partially works.

I supose its a docker/egg issue not related to the panel...

I have found those two links that may be helpful: https://developers.redhat.com/blog/2017/03/14/java-inside-docker https://www.merikan.com/2019/04/jvm-in-a-container/

petulikan1 commented 1 year ago

Thanks for letting me know about this one. I'll look into that once I'm back home.

Once again thanks.

schrej commented 1 year ago

Has anyone tested running the exact same version of Paper outside of Pterodactyl? Does your system also report the same amount of memory consumption (e.g. htop)?

MrBretze commented 1 year ago

Does your system also report the same amount of memory consumption (e.g. htop)?

Yes its report the same amout of memory,

Has anyone tested running the exact same version of Paper outside of Pterodactyl?

I tested outside of pterodactyl, I don't see any problem, but I need to retry this correctly

KugelblitzNinja commented 1 year ago

I have also been having this issue,

been finding for a while now that the docker containers have been using a lot more ram then the servers have been.

It has been common to see one of our minecraft servers that is set to use -Xmx16G to start using 20gb+ after a few hours. God forbid you don't set a container ram limit, iv seen 40GB+.

I noticed this started happening after updating to Ubuntu 22.04.1 LTS from Ubuntu 18. (inherently docker was also updated but i don't know what version we was using)

From all the things i have tried i get the feeling it's a docker related issue, since i was able to recreate this by manually booting a server in docker and seeing the same over memory consumption by the container.

Loren013 commented 1 year ago

Hey! I have the same issue,

Has anyone found any reasonable sollution? I thought that the problem is connected with plugins. Tried to run different servers, w/o plugins, different versions. And still has the issue. Attaching more RAM to the container helps, but I wonder if SWAP memory can also help.

Anyway, any response with some updated information or solution will be appreciated! Gosh.. Well, at least the problem for sure isn't connected with plugins.

MrBretze commented 1 year ago

Has anyone tested running the exact same version of Paper outside of Pterodactyl? Does your system also report the same amount of memory consumption (e.g. htop)?

So after having performed tests outside of pterodactyl, i don't have any issues regarding memory usage. However, I suppose JVM and Docker are the troublemakers here

MrBretze commented 1 year ago

@KugelblitzNinja and @Loren013 Currently, the only 'fix' I have found is to use this command line: java -Xms128M -XX:+UseContainerSupport -XX:MinRAMPercentage=25 -XX:MaxRAMPercentage=50 -jar {{SERVER_JARFILE}} It's important not to specify the 'Xmx' Java argument, otherwise it won't work.

petulikan1 commented 1 year ago

@KugelblitzNinja and @Loren013 Currently, the only 'fix' I have found is to use this command line:

java -Xms128M -XX:+UseContainerSupport -XX:MinRAMPercentage=25 -XX:MaxRAMPercentage=50 -jar {{SERVER_JARFILE}}

It's important not to specify the 'Xms' Java argument, otherwise it won't work.

Specify Xms or Xmx? Asking because in the startup command you wrote 'Xms' and you're saying not to specify the 'Xms'

MrBretze commented 1 year ago

Specify Xms or Xmx? Asking because in the startup command you wrote 'Xms' and you're saying not to specify the 'Xms'

Oops indeed I was wrong! Sorry !

KugelblitzNinja commented 1 year ago

-XX:MaxRAMPercentage=50 can work but is a pain, wasting so much ram.

The most useful thing I have tried so far is playing around with diffrent docker bases.

For me its been less of an issue when using docker containers with a base of debian.

Currently i am using kugelblitzninja/pterodactyl-images:debian-zulu-openjdk-19 <-- If you do try using this please let me know how it works for you and also be aware this has extra software added to create a backup on server shutdown!

With this we can run somthing between -XX:MaxRAMPercentage=80 / -XX:MaxRAMPercentage=93.

KugelblitzNinja commented 1 year ago

On a side note, The follwing is to help on another issue this issue can cause.

If you have not already, disabled the OOM killer.

If you find your servers still being killed insted of going into a zobmie like state (untill assigned more ram), When running low in free ram in the container.

There is a good chance like me you will find out like me that the panal was unable to disable the OOM killer but did not say anything, and the recommended edits in there discord to the grub files was of no help.

I was able to verify this by reading the system logs.

If you find this is the case your going to have to take to google to find other ways to allow the panel to disable it.

An issue I have only has so far had with the host being ubuntu 22.

KugelblitzNinja commented 1 year ago

I forgot to mention something important in the above,

If your server is getting killed by the OOM killer, This dose not mean your mincraft server ran out of ram! Just that the container is now using it's allocated amount.

(being aware the the OOM killer can fail to be disabled with no notification in the panel)

To see if your minecraft server actually ran out of ram check your log fies to see why it had crashed, you will see out of memory type exceptions, in your server logs file and or in you JVM crash report (look for somthing like hs_err_pid29.log in the root of your server).

if it just died with no error in the log files and they just end, It was the OOM killer. (This can cause world corruption).

matthewpi commented 1 year ago

To everyone reading this issue and editing their kargs, stop. Reverting to cgroups v1 is not a solution or a fix to any of these problems, it's a terrible workaround that causes more problems, not less.


This problem is caused by many different factors, and is an issue specifically regarding the images and startup commands themselves, nothing else.

First off, setting Xmx to the allocated amount of memory the container has allocated will cause a OOM if all that memory is actually used. If the JVM uses all the memory assigned to the container, there is little to no memory left for anything outside the JVM; Java itself doesn't just use the JVM and requires memory for outside of it. (Setting Xmx also overrides the MaxRAMPercentage flags and disables the automatic container memory limit detection built into newer versions of Java).

Secondly, the ghcr.io/pterodactyl/yolks:java_8 and ghcr.io/pterodactyl/yolks:java_11 both lack container detection support (I am working on a fix for this). They will instead detect 1/4 of the memory available on the host system by default, which will then be affected by the MaxRAMPercentage flag. So if you are running these images and experiencing issues, you will want to set -Xmx to a value below the allocated amount of memory to the container, overhead of a 128MB or so should be more then enough. And for those wondering, no the -XX:+UseContainerSupport flag does not help, and is only required for Java 8; Java 10 and above have it enabled by default, assuming the build of Java actually has the feature, which these specific builds seem to lack. The ghcr.io/pterodactyl/yolks:java_8j9 image does have support for containers, but the -XX:+UseContainerSupport flag will need to be added for it to work.

Finally, for all the Java versions with container detection support, the default MaxRAMPercentage of 95.0 does not provide enough overhead. Because the memory value will be detected as what the container is allocated, the built-in memory overallocation logic in Wings (we assign additional memory to containers rather than the exact amount specified to help prevent issues with OOM) is included in the RAM calculation, meaning the only overhead available is 5%. A MaxRAMPercentage value of 80-90% would allow for much more overhead. The more RAM your server has assigned, the higher this value can be (within reason).


For most users (especially running newer or latest versions of Java), everything should work fine out of the box. However tweaking of the MaxRAMPercentage flag will likely be required for many users.

KugelblitzNinja commented 1 year ago

@matthewpi

Even with MaxRAMPercentage set too 50% to 75% given a few days on our servers, if the OOM is not disabled it still gets killed by it. Even with containters that have 20GB to 30GB of RAM.

Would you have any advice on why the containers are trying to use so much extra ram? Any ideas of possible tools and or guides that can be used to diagnose the issue? Any other suggestions on what other things could be tweeked?

Like this server image(2) 6GB overhead is a bit too much.

edit : (I dont consider OOM here the primary issue here, more of why the hell is 6GB+ is needed for overhead)

petulikan1 commented 1 year ago

Hey guys, I've got a small update maybe related to this issue. Not sure what might be causing this, but there's this kind of a limit of threads and it reached it's max limit and is not able to create more of them even tho we have unlimited memory for the server. Hope it helps figuring out what could be wrong! image image

KugelblitzNinja commented 1 year ago

@petulikan1

For that i think you need to have a look at https://pterodactyl.io/wings/1.0/configuration.html#container-pid-limit , if its still at container_pid_limit: 512 then you going to want to increase it.

It is also worth confirming your host did not run out of ram.

petulikan1 commented 1 year ago

@petulikan1

For that i think you need to have a look at https://pterodactyl.io/wings/1.0/configuration.html#container-pid-limit , if its still at container_pid_limit: 512 then you going to want to increase it.

It is also worth confirming your host did not run out of ram.

Thanks for the link, was trying to find something related to that but wasn't able.

Hamziee commented 1 year ago

Any updates? I have the same problem

mckenziecallum commented 1 year ago

Hey all, has anyone managed to find a solution to this problem? I have been running a server for the past 6 months and it's been memory leaking from the start. The server has had no more than 2 players on it concurrently and it's eating up 14GB RAM, I have had to resort to restarting every 8 hours so that the server doesn't die

rex2630 commented 1 year ago

Yeah, same problem, it is insane, i have thought that it was problem with plugins or something, but thats nonsense. The server doesnt have more than 10 concurent players and the ram still going up and never stops and after all of the checks from mc server side ram never goes more than 4GB, but ptero panel does. Ram is even being cleaned from server side...

Loren013 commented 1 year ago

Guys. The problem isn't connected with your plugins or pterodactyl itself. I had the problem for soo long. Like 2-3 months. And I've got rid of the problem in so easy but pricy way if you don't use a hosting, and host the server by yourself. So...

I sent a message to my hosting where I have dedicated server and said that I have some troubles. Presumably connected with memory. So please check the sticks of RAM. They made some tests and found out, that was the problem. They changed RAM in couple of hours and since then, it's been like 2 month - everything is OK. So.

The problem is with hardware itself. Don't check all the plugins, if you have the issue check the hardware. There are some programms and BIOS itself that can check RAM health, you can find some ways on the internet.

KugelblitzNinja commented 1 year ago

I did not think it could be hardware, I doubt it's a just matter of bad ram if it is as this issue keeps happning on 5 different hosting machines for me.

My current top suspect is that something involving docker and maybe cgroups V2 is causing this issue.

I have also had this issue when running the servers via docker not using the panal this tells me its not an issue with pterodactyl its self, This excessive ram usage issue dont happen if the server are not being run in docker. When I say excessive I mean a server with xmx and xms set to 12GB but you see its using 30GB after 24 hours of running.

Sharktheone commented 1 year ago

I have also noticed this issue, i had a server consuming 40GB ram or something. I have "fixed" it by using another startup command. If the issue is with the hardware itself, this can only help up to a certain point.

java -Xms128M -XX:MaxRAMPercentage=95.0 -Dfml.queryResult=confirm. -Dterminal.jline=false -Dterminal.ansi=true -XX:+UseG1GC -XX:+ParallelRefProcEnabled -XX:MaxGCPauseMillis=200 -XX:+UnlockExperimentalVMOptions -XX:+DisableExplicitGC -XX:+AlwaysPreTouch -XX:G1NewSizePercent=40 -XX:G1MaxNewSizePercent=50 -XX:G1HeapRegionSize=16M -XX:G1ReservePercent=15 -XX:G1HeapWastePercent=5 -XX:G1MixedGCCountTarget=4 -XX:InitiatingHeapOccupancyPercent=20 -XX:G1MixedGCLiveThresholdPercent=90 -XX:G1RSetUpdatingPauseTimePercent=5 -XX:SurvivorRatio=32 -XX:+PerfDisableSharedMem -XX:MaxTenuringThreshold=1 -Dusing.aikars.flags=https://mcflags.emc.gs/ -Daikars.new.flags=true -jar server.jar nogui done

I think for older versions of minecraft you need to remove the -Dusing and -Daikars flags.

And one last thing you shouldn't set your xms as the same as your xmx. The xms flag only sets the initial ram usage of the server. Something like 128M is here better, so it doesn't consume 12GB instantly

Hamziee commented 1 year ago

I found a fix, but it does not work for Forge servers. Using a different docker file. Java Openj9. There are multiple versions: Java 17: ghcr.io/pterodactyl/yolks:java_17j9 Java 16: ghcr.io/pterodactyl/yolks:java_16j9 Java 11: ghcr.io/pterodactyl/yolks:java_11j9 Java 8: ghcr.io/pterodactyl/yolks:java_8j9

This mostly fixed the memory leaking issue for me. If you make a different Java docker file, you might get Forge to work. I've been planning to do that, but haven't got time to try.

petulikan1 commented 1 year ago

I'll try it and then respond with the outcome of the testing. Thanks for the hopefully helpfull comment!

KugelblitzNinja commented 1 year ago

@Hamziee what Forge verstion do you have issue with? - the the modded server (MC 1.7.10) I have do work with the above, and i may be able to get other ver to work.

Im also going to give this ago on my main servers (MC 1.19.4).

Will also post in a few days if it has helped.

Hamziee commented 1 year ago

@KugelblitzNinja If you mean the Openj9 docker files; It does not work on Forge 1.17 and later, as of my testing. It errors out when starting and asks to use a different Java Openj9 version. I have not looked into building a docker file for that Openj9 Java version yet.

Hamziee commented 1 year ago

I'll try it and then respond with the outcome of the testing. Thanks for the hopefully helpfull comment!

@petulikan1 I got the issue around the same time you made this Issue, I tried everything in this Issue, and it did not work. I even had the same issues using clean panels and clean deamons installs. I almost gave up on it until I learned about Openj9's Java, I saw pterodactyl made docker files for it. So I used it and seemed to fix the issues for me. If this fixed it for you, please let me know. Or if you have any problems I am happy to help :)

KugelblitzNinja commented 1 year ago

@Hamziee Ya it looks like all the verstion I tested after 1.7.10 just dont work with Openj9 by the looks of it it causes a lot of odd issues with mods.

petulikan1 commented 1 year ago

So after a 2 days of using Java 17: ghcr.io/pterodactyl/yolks:java_17j9, RAM usage on one of our servers went from 20 GB (at 20 hours of uptime) to ~10 GB (23 hours uptime).

That's really huge difference and we'll continue on using them.

Thanks @Hamziee !

KugelblitzNinja commented 1 year ago

Unfortunately, I was unable to use Openj9 on my servers as the Java VM keeps on crashing soon after the servers start. I tried a few different Openj9 versions but all of them were too unstable.

Hamziee commented 1 year ago

As I said, it will not work on the latest Forge servers.

The error on Forge servers:

You are attempting to run with an unsupported Java Virtual Machine : Eclipse OpenJ9 VM
Please visit https://adoptopenjdk.net and install the HotSpot variant.
OpenJ9 is incompatible with several of the transformation behaviours that we rely on to work.

As it states, Openj9 does not work with the transformation behaviours Forge uses. And with HotSpot variant it means the default Java. So sadly, Forge 1.17 and later is incompatible. Using a specifically tuned startup argument is the best you can do, as for my knowledge.

KugelblitzNinja commented 1 year ago

@Hamziee Im not trying to run forge servers and are also not getting that error. Im getting some strange JVM crash and thread dump, running Paper 1.19.3 maybe 10 to 120 mins after running.

Hamziee commented 1 year ago

@KugelblitzNinja Weird, I have been running a Paper 1.8.9 and Paper 1.19.3 with around 50 plugins without an issue. Maybe if you inspect the log you could find the issue why it crashes. Probably some plugin.

gerolndnr commented 1 year ago

@Hamziee Im not trying to run forge servers and are also not getting that error. Im getting some strange JVM crash and thread dump, running Paper 1.19.3 maybe 10 to 120 mins after running.

I had the same issue. The server started and crashed after a few seconds running showing some JVM dump error. Some plugins seem to cause weird behaviour, in my case it was the spark profiler. After removing, it worked fine. However, the problem is not solved for me. While it uses significantly less RAM, the usage is still constantly increasing.

Hamziee commented 1 year ago

I found a fix, but it does not work for Forge servers. Using a different docker file. Java Openj9. There are multiple versions: Java 17: ghcr.io/pterodactyl/yolks:java_17j9 Java 16: ghcr.io/pterodactyl/yolks:java_16j9 Java 11: ghcr.io/pterodactyl/yolks:java_11j9 Java 8: ghcr.io/pterodactyl/yolks:java_8j9

This mostly fixed the memory leaking issue for me. If you make a different Java docker file, you might get Forge to work. I've been planning to do that, but haven't got time to try.

@gold-ly Did you try this?

gerolndnr commented 1 year ago

Yes, I am using the Openj9 Java 17 Image. It significantly decreased my ram usage, but didn't fixed the memory leak.

Hamziee commented 1 year ago

Yes, I am using the Openj9 Java 17 Image. It significantly decreased my ram usage but didn't fix the memory leak.

@gold-ly Maybe try using the default Java again, upload Spark, and see if maybe a plugin is leaking some memory over time. If that's not the case, then there is a plugin that is incompatible with Openj9's Java.