Large grain crashes Davros

griff commented 3 years ago

I have a grain that I used to sync all my photos to so it is about 11Gb and it can't start anymore.

As best I have found in sandstorm logs is that the problem is with the preview file caching storing its data on tmpdir which is memory backed in the grain and so it fills that and crashes.

mnutt commented 3 years ago

Hmm, that's something I had not considered! I could have the thumbnailer expire images via LRU or something, probably as a sort of background job. Is this something that happens over a long duration of time, or you have one single large directory and merely viewing it fills up the memory and crashes?

mnutt commented 3 years ago

Hmm, as I look at it a bit more I don't recollect previews/thumbnails being stored on a memory-backed filesystem. If you start davros outside of sandstorm on a unix system it'll likely put thumbnails in /tmp, but within sandstorm these end up in /var/davros/tmp, which should be file storage:

https://github.com/mnutt/davros/blob/master/.sandstorm/sandstorm-pkgdef.capnp#L160

Maybe it's some sort of leak in the thumbnailing itself, or davros trying to generate too many thumbnails at the same time?

ocdtrekkie commented 3 years ago

This probably should constitute a breaking issue for approval. I'm not sure how many people have exceptionally large Davros grains, but I am concerned if we don't suss out the issue here, we will find out how many people have very large Davros grains. ;)

I know there was some further discussion on IRC, did we get anywhere in identifying exactly what the issue was? @griff, you mention some logs, can you share them here, by chance, sanitized if necessary?

griff commented 3 years ago

I have just put my new grain through its paces and I can't reproduce my own problem so I am just closing this issue.

I first uploaded all pictures stored on my computer to the grain (11Gb) but in multiple folders.

And I have just now finished uploading all pictures from my phone (10Gb) using the same method that was used to populate the failing grain. It creates a single folder with all 2600 images and videos in it and while loading the davros view of just that folder is a bit slow I haven't noticed any breakage.

griff commented 3 years ago

Sorry for the inconvenience!

Michael-S commented 2 years ago

I am able to reproduce this issue with a Davros grain that has over 1000 images. A backup of the grain is available here if anyone else wants to try. It contains a lot of NSFW language - it's a collection of memes I share with family and friends. There is no nudity. https://2oibt9mht7i0o2w4is69.ducky.sandcats.io/Davros_funnies_in_line.zip

ocdtrekkie commented 2 years ago

@griff or @mnutt , can we reopen this?

griff commented 2 years ago

I have found the underlying problem that was causing my issue. It is this: https://github.com/sandstorm-io/sandstorm/issues/3512

ocdtrekkie commented 2 years ago

Okay, that would make this no longer a Davros issue, arguably, unless @mnutt intends to find some way to create less files when making thumbnails... which seems unrealistic?

Are you able to raise your fs.inotify.max_user_watches value on the box in question? It sounds like in kernel 5.11 and up, Linux will more intelligently set this default value based on the memory of your machine.

Michael-S commented 2 years ago

Thank you to all who looked into this! I changed my fs.inotify.max_user_watches to 32768 and restarted, no dice. I do not see the error in sandstorm-io/sandstorm#3512 in my logs. I do see this in sandstorm.log now:

sandstorm/gateway.c++:1072: error: exception = kj/compat/http.c++:1851: failed: expected !inBody; previous HTTP message body incomplete; can't write more messages
stack: 4c8412 4ff5e4 4a99bf 4f60da 4f70d1 544fa1 4fe500

I would swear that error in the log is new. I haven't touched C++ in 16 years, but I'll take a look at that file and see if anything useful pops out at me.

ocdtrekkie commented 2 years ago

@griff Did you see the error in your Sandstorm log by chance?

@Michael-S Do you know if that setting is machine or user specific where you changed it? I want to say it might be the latter, and sandstorm runs as its own user account. (I don't know how to set that setting even, just trying to ballpark guesses based on what I read.)

Michael-S commented 2 years ago

I changed it in /etc/sysctl.conf and rebooted the VM, so I don't think that's it. Edit: to inspect the value, do cat /proc/sys/fs/inotify/max_user_watches To change the value, you can change it dynamically but the easy way is to add a line to /etc/sysctl.conf: fs.inotify.max_user_watches=32768 and then restart.

zenhack commented 2 years ago

@ocdtrekkie, it is not user specific.

ocdtrekkie commented 2 years ago

🤔 So do @griff and @Michael-S have different issues then? I am really curious if @griff found the sandstorm/supervisor.c++:232: overloaded: inotify_add_watch: No space left on device errors in his system log then, since @Michael-S did not.

griff commented 2 years ago

@ocdtrekkie I got inotify_add_watch in the log and increasing fs.inotify.max_user_watches fixed my issue. So it looks to be different issues.

ocdtrekkie commented 2 years ago

As a stupid check (on myself): I asked if @Michael-S saw the inotify_add_watch error appeared in the Sandstorm/system log, and the other issue specifies that it appears in the grain log. @Michael-S Nothing in the grain log for the grain that won't start for you, right?

Michael-S commented 2 years ago

Right, nothing in the grain log and nothing related to inotify in the Sandstorm log.

ocdtrekkie commented 2 years ago

Okay thanks, I figured but just wanted to confirm

mnutt commented 2 years ago

Hmm, at some point maybe I can explore storing thumbnails in a SQLite database or something.

mnutt / davros

Large grain crashes Davros #136