stashapp / stash

An organizer for your porn, written in Go. Documentation: https://docs.stashapp.cc
https://stashapp.cc/
GNU Affero General Public License v3.0
8.95k stars 785 forks source link

[Bug Report] Excessive RAM Usage in Docker, no "RAM free-up" at all #2195

Open toby666123 opened 2 years ago

toby666123 commented 2 years ago

Describe the bug I ran the Generate task today for the first time for 2 TB of Scenes. After completion (18 hours) Stash was consuming 7 GB of RAM, hardware is a Synology NAS. Although task was complete, Stash did not free up that memory any more, only way to fix it was to stop & start the container. Kept a close eye on that today for further tasks and the behaviour is the same each time: Stash consumes RAM but will not free it up after task completion.

To Reproduce Steps to reproduce the behavior: 1) Install Stash as a Docker container 2) Run work-heavy tasks which consume a lot of RAM 3) Issue after Task completion: RAM will not be freed up by Stash

Expected behavior Stash frees up any RAM it does not need any longer. For reference: After the "stop/start container" procedure, Stash is regularly consuming less than 50 MB of RAM.

**Stash Version: (from Settings -> About): v0.12.0

kermieisinthehouse commented 2 years ago

Can you post the scan and generate settings that you're using so that I can attempt to reproduce? I can't seem to break 200MB no matter what I do.

toby666123 commented 2 years ago

Here you go: image image image

Just as a live example: Had my stash scrape all Performer data yesterday evening. After scraping, the docker container consumed ~300MB RAM. Now, 18h later, it consumes ~800MB RAM. image

toby666123 commented 2 years ago

Further info, maybe this helps for investigation: RAM usage seems to go down sometimes, just noticed this by coincidence. RAM is now "down" to 794 MB. image

bnkai commented 2 years ago

@toby666123 can you try the below? restart the stash container go to the tasks page and do a scan/generate task and let it run for a while if you notice the memory usage is big enough cancel the task and wait a few minutes to see if the memory (most of it) is freed IMPORTANT do not browse any other stash page while doing the above

From a quick testing and monitoring using something like top -b -d 15 -p $(pidof stash-linux) | tee log_mem it seems that while the scan and generate tasks do free part of the memory used, a simple browsing of the scenes/galleries/images blows up the ram usage without releasing it.

toby666123 commented 2 years ago

Did exactely as you described:

image

toby666123 commented 2 years ago

After restarting the Docker container: image

bnkai commented 2 years ago

Not sure if errors are related, i cant replicate using only the generate task. Can you get us the error log from the generate task as well as the options you tick? (redact any personal info) 1.7GB Ram in one minute is too big too fast

kermieisinthehouse commented 2 years ago

I have to admit this is an odd setup, most people tick off generate settings inside of scan and import batches of new content in one step. This may be a problem in the generate task

kermieisinthehouse commented 2 years ago

I can reliably reproduce this by just running the Generate task with the above configuration, when I set the concurrency to one parallel task.

toby666123 commented 2 years ago

Will re-do the test with the generate task tomorrow and post the log here. It will be big though, last time the log file was 29MB....

Regarding Scan: I am using generate because Scan does not offer the options of phash (which I want) and overwrite generated data (which I do not want). What are the defaults for these options in Scan? I can re-do the same test with Scan instead of Generate as well to check if behavior is the same.

Btw: If Scan and Generate do not run the same code, why not just add the 2 options to Scan and get rid of the Generate task? Any reason to keep it?

kermieisinthehouse commented 2 years ago

You can generate phashes in scan. Look a little closer :). Scan generation does not overwrite anything.

The difference is that Scan's generation only applies to new content that is discovered during a scan, while the generate task tries to find anything missing from your entire library. If your workflow is to copy content in, you only have to run Scan to get the generated items for new stuff.

I need to run a few more tests before I narrow down the minimal generate task that's needed to leak memory

toby666123 commented 2 years ago

Alright, understood. Makes sense now that I got it. :) And good to know w/ phashes in Scan, you are right of course. :)

Do you still need the log file now that you can re-produce the bug?

kermieisinthehouse commented 2 years ago

No, it probably doesn't contain anything good anyways. I'll just throw it in a profiler when I get to a computer. Best case, you can expect a fix in the development branch sometime this release cycle

bnkai commented 2 years ago

@toby666123 if you still have the log file (and if you can the settings you used for the generate and the number of parallel tasks if its not shown in the file) and can share it somewhere here or in the discord channel i could take a look. My main problem is reproducing the issue especially 1,7 GB RAM consumption ~1 min after start of the task

toby666123 commented 2 years ago

@bnkai The settings were exactly as shown above in the screenshot, 1 parallel task only. The task ran over 7 corrupt video files.

Unfortunately I am on vacation since today and will return to a computer next Sunday. I can re-run the task and create a log file after that only but will do asap.

bnkai commented 2 years ago

@bnkai The settings were exactly as shown above in the screenshot, 1 parallel task only. The task ran over 7 corrupt video files.

Unfortunately I am on vacation since today and will return to a computer next Sunday. I can re-run the task and create a log file after that only but will do asap.

No problem we are not in a hurry :-) , if the issue isnt resolved/tracked till then please do that.

WithoutPants commented 1 year ago

@toby666123 can we get a follow up on this?

eddiebeazer commented 7 months ago

Is there a good way to get a list of all of these potentially corrupt video files. I've been setting a limit of 8GB for my container and it usually crashes during a generate task. I currently removed that limit and my container is sitting at 45GB usage after my generate task and it hasn't freed up any ram at all

Konrni commented 1 month ago

i stumbled across this problem too, my problem was with the different reporting tools. PVE and docker report total memory. But linux might use some for a file system cache. https://www.linuxatemyram.com/

What i don't quite understand is why it only happens when their is an "error generating sprite/preview". My best guess would be because the other files don't get loaded by ffmpeg so linux won't cache it (because they don't have missing sprites/previews). If i run more than one generate task while ram is "maxed" ram won't increase further, which looks confirming to my guess.

At least this is my impression after testing it with other tools (top/htop/netdata).