pop4959 / Chunky

Pre-generates chunks, quickly, efficiently, and safely.
GNU General Public License v3.0
556 stars 63 forks source link

Fix vanilla block entity tickers memory leak #343

Open Juloos opened 3 months ago

Juloos commented 3 months ago

Block entity tickers in World are not being removed correctly when unloading a WorldChunk, which leads to a memory leak as well as decreasing tick performance relatively to block entities. If we don't remove the block entity tickers, they will get accumulated in the world while chunks get generated.

You can verify that statement if you create a heapdump of a server (using Spark for example) after a long time generating chunks (in my tests, ~10000 chunks radius), under the retained objects.

I only created a pull request for the Fabric side of the project since I don't really know how to mod using any other mod loader (and I don't know if that will be even possible with plugin loaders).

pop4959 commented 3 months ago

Hey, thanks for looking into this, although this seems to be a vanilla issue. Did you open a Mojira issue for this already?

Generally speaking, I have avoided adding non-accessor mixins to chunky to increase compatibility with other mods, and also it is not really a goal for us to provide these sorts of fixes. This seems like it would fit better into a plugin such as Lithium, although I understand why you have made a PR here (since it does closely relate to chunk loading/unloading).

I am not fully against it, but for this to make sense to me this needs to be a pretty significant leak. There are other leaks in the game and most of them won't cause trouble unless you are generating for weeks without giving it a break. Would you mind sharing additional details as to how severe this is? It would also be handy if you can share the heap dump you generated.

I also want to be sure that this won't accidentally break anything. As long as these aren't normally saved to any tick list on the chunk it is probably fine though (or if this gets called afterward).

I am also skeptical of the claim that this affects performance, besides potentially as a result of the leak itself. After briefly taking a look at the code, I can see that these tickers are removed when the chunk is ticked, but of course, when loading chunks with chunky the chunk is never ticked and so those tickers will never be removed. But if block entities aren't ticking there should also not be any performance loss there. Let me know if I missed anything there.

Let me know what your thoughts are on the above, before continuing. If it's easier, we can also discuss this more on Discord.

Thanks again for the PR!

Juloos commented 3 months ago

Hi, firstly no I didn't bother opening an issue on Mojang's bug tracker (kinda lazy about that), but yeah I can understand why you are reluctant on the PR.

From the tests I made the leak was quit significant enough to always make my server crash (12Go assigned to Minecraft, 15 to the JVM, accounting for Aikar's flags & optimization mods ect), running out of memory when chunk loading for a radius of 10000 (it was crashing way before it finished). To make sure it wasn't coming from any other mod I also tested out the same setup without any other mods than Chunky and Spark btw. The RAM usage of Minecraft itself was continuously increasing during the chunk generation until running out of memory, but with my fix it stayed constantly at the same (relatively) low usage from end to finish.

I also claimed this thing about performance after some research on the code (not too much though), but after reading what you said I feel you have a more in-depth analysis of what's happening there, so I may be wrong.

You can check a heapdump here if you want (it's a small one, only ~4Go), there's not much more to what I said really. I first created a little mod to fix this memory leak

pop4959 commented 3 months ago

It would be great to get this on Mojira if we get a chance, especially if this is merged in, as maintaining fixes like this in perpetuity would be inconvenient. It seems like the issue here is fairly straightforward, and the fix is fairly straightforward, so potentially something that could even be fixed by 1.21 if reported now (pre-releases have started going out).

Thanks for the additional details about the leak. I trust what you're saying is accurate, but I will confirm some of the details and test myself to understand better. I have requested access to the heapdump file when you get a chance so I can take a look and compare results to my own.

No worries about the performance claim, I was just checking to see if I missed anything important, because again, the default behavior of chunky is to generate chunks without ticking them, which causes the following things to occur:

If you have time, it might be worth testing to see if the chunks do get ticked, whether this issue still occurs. It should be easy to test by using the ticking load duration flag documented here. This will force the chunks to tick instead of the default no-tick behavior, for a specific duration. Something like -Dchunky.maxWorkingCount=50 -Dchunky.tickingLoadDuration=1 -Dchunky.awaitTicketRemoval=true should work for testing this. I recommend that particular set of flags since it will only tick chunks for a second, and wait for the tickets to expire before continuing on (preventing runaway memory in case chunks load much faster than they unload). You can bump the working count if the default causes things to run a lot more slowly than your previous tests.

If you don't have time, I should be able to check this out more on the weekend and I will do the same.

Juloos commented 3 months ago

I will be able to test those things later on today and tomorrow, thank you for the details! I also shared the file to you (forgot about permission shenanigans).

My understanding from the code was that the ticking is occurring at the World level, not per chunk. Which is why iterating through all block entities (even the ones from unloaded chunks) was not great in terms of performance to me.

I'm not sure they would fix this because I don't really see any situation in vanilla where a chunk don't get ticked at least once after being loaded. I will still take some time to create a bug report there

pop4959 commented 3 months ago

I think you might be right about that. Probably worth double checking performance then. For some reason I was thinking this was on the chunk, but the part that is relevant is on the world so if it's the world tick that's more problematic.

I can't think of a situation where vanilla doesn't tick a chunk at least once before unloading, but that is why I was curious about testing that just in case. I suspect it may still occur, but perhaps it will be less severe.

My thoughts there are typically these block tickers are being removed after being ticked. If the chunk is being ticked for more time, there is a higher chance that it gets cleaned up, however, what remains would exhibit the same issue.

If that's the case, it's still a memory leak regardless and Mojang might care a little more. It's actually not the first time a memory leak in vanilla was discovered in chunky and Mojang fixed it regardless. The last time a huge memory leak occurred was when chunk blending was added, and every chunk held references to neighbor's chunks' blending... which needless to say was a huge disaster for pregen.

Juloos commented 3 months ago

I confirm that even using your flags doesn't fix the problem, not quiet sure of what's happening that won't put the tickers in "removed" mode.

Juloos commented 3 months ago

Okay my bad, to be more precise it doesn't get garbage collected right away but after some time (probably the old G1 triggering), so yeah your flags do fix the problem

Juloos commented 3 months ago

I performed more tests and it looks like there is no performance issue after-all, I didn't check the code much more and I don't think I will, I'm just gonna create a bug report on Mojira about the memory leak.

pop4959 commented 3 months ago

Feel free to link the Mojira issue here when you've made it so we can track it here.

I have thought about this a bit more and I think it makes sense to merge your fix, although I am very busy at the moment and I will likely not have time to review and test more thoroughly until at the very least after the weekend.

Nothing more needed from your end, so don't worry about testing or checking anything else (unless you want to).

Juloos commented 3 months ago

We will see where the bug report goes, feel free to do whatever you want with my pull request in the meanwhile! The Mojira issue is there for whenever you want to check it.

pop4959 commented 3 months ago

The issue was given important status which is very promising! Also linked to the old POI issue (which is also a well known memory leak).

Minecraft 1.21 is also planned to release next week. I will try to find time to look at this this weekend since I was gone last weekend, but at the very least, before releasing builds of chunky for 1.21.

Juloos commented 3 months ago

I also took care about this bug with POIs, since I though it was the main problem at the beginning. But my tests didn't really show up much memory retained, even after a lot of chunk generation. Maybe it was fixed by some optimization mod I used, didn't bother to check after my findings.

Hope they will take care of things from now on yes! Maybe next prerelease