libretro / libretro-prboom

Port of prboom to libretro - plays Doom, Doom II, Final Doom and other Doom IWAD mods.
GNU General Public License v2.0
69 stars 49 forks source link

[RAM] High memory usage. #162

Closed DrUm78 closed 1 year ago

DrUm78 commented 3 years ago

Hardware: FunKey S (SoC V3s ARM Cortex A7-A clocked @ 1.2GHz). Frontend: SDLRetro from https://github.com/FunKey-Project/sdlretro. Commit: b6f66b9ee8222bc0888ff5e58d1f89c9e3c59e8e.

That's not necessarily a memory leak as PrBoom does not increase its memory usage significantly overtime, but as an example: demo of Doom 2 uses more than 80% of RAM for a 64MB device, whereas TyrQuake uses 66-67% (after the memory leak fix). Is that normal? Maybe it's because PrBoom isn't exactly the original port so I'm wondering if there isn't any obvious RAM overuse. I know that a PR has been merged recently about memory leak though.

Steps to reproduce:

  1. Run Doom 2 demo with PrBoom core through my standalone OPK on the FunKey S: https://github.com/DrUm78/libretro-prboom/releases (I think it behaves the same on other platforms but cannot guarantee it).
  2. Notice the RAM usage (quickly above 80%) whereas it's less than 67% for TyrQuake (after the memory leak fix).
jdgleaver commented 3 years ago

This may help somewhat: #163

I'm not entirely certain what platform build target is used for the FunKey S, but the core will need to be built with HAVE_LOW_MEMORY = 1

DrUm78 commented 3 years ago

Amazing, will try this once it's merged, thanks for your work. 👍

DrUm78 commented 3 years ago

Now, demo of Doom 2 does not go above 43% of RAM usage (was previously about 80%) and WAD loading is faster too, good job again! 👍

DrUm78 commented 3 years ago

There is a downside though... Very large levels need a very high speed storage. Now very large levels are jerky, even with my very fast sd card. 😬 But that's quite the design of your fix, not sure we can do something more here.

jdgleaver commented 3 years ago

Ah, that's a pity...

Are you able to compile your own builds? If so, you could try playing with the memory_size value here: https://github.com/libretro/libretro-prboom/blob/70b0d070f58982854d05a725b330ffc94cfb8298/src/z_zone.c#L100

Increasing the purge limit won't help with the initial load, but it may smooth things out as chunks of the WAD file get cached. If you can get acceptable performance, we can think about making the purge limit size a core option.

DrUm78 commented 3 years ago

Yes I compile my own .so. I tried with 16MB instead of 12MB and that's much better, that does not exceed 51% of RAM usage (out of 64MB) in the largest level I found (demo 2 of Eternal Doom III PWAD) and that's still silk smooth. Will test some other values to find the best one for my device. Anyway, now I have the choice to enable/disable the feature and tweak the purge limit at will when compiling so that's great. :)

jdgleaver commented 3 years ago

Thanks, that's good to know!

I noticed that some other source ports set a limit of 16 MB, so that probably is a sensible choice in general - although it's a bit much for devices with only 32 MB of RAM total. So an option to tweak this value may indeed be appropriate. I'll be interested to see which value works best on your device - that will be useful for setting defaults :)

DrUm78 commented 3 years ago

Hacx PWAD now crashes with Arithmetic exception in my logs as soon as it tries to launch the demo or if I launch a game. Was not happening with previous code. 😬 I guess I can reopen the ticket then. A good news is always followed by a bad one ah ah. I will test all my IWADs and PWADs.

DrUm78 commented 3 years ago

I can link it here as it's free PWAD: HACX.zip

jdgleaver commented 3 years ago

Hmm... That HACX.WAD is running fine for me on Linux, with HAVE_LOW_MEMORY = 1. It does however produce a lot of bad texture log messages, which are concerning (since this causes some invalid offsets to be calculated, which do who knows what...)

Arithmetic exception means either a division by zero, or the the overflow of a signed integer. I guess the difference between the funkey and my desktop is that the former is 32bit and the latter is 64bit, and on a 64 bit OS the used data types might be wide enough to mask any erroneous overflows...

jdgleaver commented 3 years ago

Oh, wait - it was referencing the doom 1 wad. I moved it to a folder just with the doom 2 wad, and now it's crashing - I shall investigate...

DrUm78 commented 3 years ago

Yeah that's for Doom 2. I tested 16 of the IWADs and PWADs I regularly use and Hacx is the only one that obviously crashes.

Also, I made a 16MB and a 32MB purge limit .so files and they seem to use the same amount on RAM and run as smoothly, even with very large levels. 16MB seems to be the right value for my 64MB device then.

jdgleaver commented 3 years ago

Okay, I found the issue :)

diff --git a/src/w_wad.c b/src/w_wad.c
index 8fd6b66..ef16e50 100644
--- a/src/w_wad.c
+++ b/src/w_wad.c
@@ -549,16 +549,19 @@ void W_ReadLump(int lump, void *dest)
 {
   lumpinfo_t *l = lumpinfo + lump;

-    {
-      if (l->wadfile)
-      {
+   if (l->wadfile)
+   {
 #ifdef MEMORY_LOW
+      if (l->size > 0)
+      {
          rfseek(l->wadfile->handle, l->position, SEEK_SET);
          if (rfread(dest, l->size, 1, l->wadfile->handle) <= 0)
             I_Error("W_ReadLump: read failed");
+      }
+      else
+         I_Error("W_ReadLump: attempt to read lump of zero size");
 #else
-         memcpy(dest, &l->wadfile->data[l->position], l->size);
+      memcpy(dest, &l->wadfile->data[l->position], l->size);
 #endif
-      }
-    }
+   }
 }

(That WAD is somewhat badly behaved, but the error seems harmless and should be handled)

Thanks for the purge limit info. We'll make that an option, then, with an appropriate default!

DrUm78 commented 3 years ago

Awesome, the fix works. :)

Gonna continue my tests, thanks. 👍

jdgleaver commented 3 years ago

Here we go: https://github.com/libretro/libretro-prboom/pull/164

This should solve the issues identified so far :)

DrUm78 commented 3 years ago

Seems to work well now with the PR merged. One thing I noticed though (not a problem): even when increasing the Cache Size at any value, I cannot reach the amount of RAM usage I had previously with the old code, is that intended?

As an example:

I mean that's nice, but is it intended to behave this way?

jdgleaver commented 3 years ago

Yes, absolutely intended! :)

Without the MEMORY_LOW flag, it caches the entire WAD file in RAM before starting to do anything. With that flag set, it only reads it on demand, and keeps no redundant copies - so you're automatically saving ~12-20MB of RAM depending on the WAD.

DrUm78 commented 3 years ago

Amazing, thanks for the explanation! I guess we can close the bug again, thanks for your work. :)

jdgleaver commented 3 years ago

Thanks for your help with the Hacx bug report and cache size testing! Glad we got this sorted :)

DrUm78 commented 2 years ago

Very sorry to reopen it but there's still some memory leak. 😬 It's better than before but leaving the game playing a few hours still makes the core exit because all the RAM and swap are filled (64MB RAM here). That's especially noticeable when loading high RAM consuming PWADs (like Doom 64). I left my device running last night because I was suspicious about the used ram that was growing and growing, then I found my device back to the desktop (GMenu2X). That definitely does not happen anymore with TyrQuake for instance. BTW, that happens either with "MEMORY_LOW" set or not. With it set, the device becomes very sluggish after a while and I have to set it to 256MB to get it fine then it will exit after a while.

jdgleaver commented 2 years ago

Oof... That's unfortunate, but thanks for letting me know! I shall endeavour to investigate...

DrUm78 commented 2 years ago

That PWAD is a good example but I guess you will need a few hours with a low RAM device to reproduce it. But anyway you can see very quickly that used RAM grows and grows. doom64.zip

jdgleaver commented 2 years ago

This is rather difficult....

The core itself isn't actually 'leaking' memory, per se; all the memory is accounted for and it gets free()'d where appropriate. The problem is that something like the doom64 wad really does require a huge amount of memory. I don't know what it's doing internally, but while playing the game it's allocating large amounts of memory constantly. If you set a MEMORY_LOW threshold then it tries to keep within the limit - but when it prunes the allocated memory, it only considers the transient 'cache' region. There is also a persistent 'static' region which doesn't get touched - and here as well, memory gets allocated constantly while the wad is running. I guess with vanilla Doom and Doom II, the rate at which the 'static' memory gets allocated is low enough that most devices with more than a trivial amount of ram can handle it, but doom64 hammers it hard.

Even trying to trace back where in the code this 'static' memory gets allocated is difficult - it's all concealed by wrappers and defines - I really can't see what it's doing...

DrUm78 commented 2 years ago

Yeah that does not seem to be an easy one. Was just spotting this but if that needs too much work, I think that's not worth the effort...