superseriousbusiness / gotosocial

Fast, fun, small ActivityPub server.
https://docs.gotosocial.org
GNU Affero General Public License v3.0
3.67k stars 310 forks source link

[bug] Cache pruning not working properly? #1079

Closed gw1urf closed 1 year ago

gw1urf commented 1 year ago

Describe the bug with a clear and concise description of what the bug is.

My media cache, on a single user instance, filled my VPS's system disk. I have media-remote-cache-days set to 3, but there were > 30,000 media items with ages much older than that.

What's your GoToSocial Version?

0.5.2 git-c31f219

GoToSocial Arch

x86_64 binary install

Browser version

No response

What happened?

It's possible I'm misunderstanding the caching settings, but my VPS ran out of disk yesterday. I had media-remote-cache-days set to 3, but my storage usage was > 5GB. Looking at the database, I could see> 30,000 items in media_attachments where remote_url was NULL and all *_at dates were more than 3 days old.

To get things back up and running, I wrote a perl script to remove files where remote_url was NULL and file_updated_at was more than 3 days ago and to set cached=0 for those items. That removed > 3GB of cached data and, so far as I can tell, GoToSocial is happy to re-acquire them if needed.

The logs do show that the prune function is running and removing files but, for some reason, it's not removing anything like as much as I'd expect.

For now, I've set media-remote-cache-days to zero and am sticking with the perl script (which GitHub is refusing to let me attach at the moment).

What you expected to happen?

Media shouldn't have remained cached.

How to reproduce it?

No response

Anything else we need to know?

No response

tsmethurst commented 1 year ago

Hmmm thanks. Did you happen to see what kind of media were generally being preserved (was it attachments? emojis? profile/avatar pics?) that should have been cleaned up?

NyaaaWhatsUpDoc commented 1 year ago

Do we cleanup old emoji at the moment? I didn't think we did :thinking:

gw1urf commented 1 year ago

There were definitely attachments and profile pics. I wasn't sure that emoji should be pruned, so I didn't look for those.

I can turn expiry back on for a few days if there's anything I can do to help diagnose it.

19 Nov 2022 12:08:49 tobi @.***>:

Hmmm thanks. Did you happen to see what kind of media were generally being preserved (was it attachments? emojis? profile/avatar pics?) that should have been cleaned up?

— Reply to this email directly, view it on GitHub[https://github.com/superseriousbusiness/gotosocial/issues/1079#issuecomment-1320871731], or unsubscribe[https://github.com/notifications/unsubscribe-auth/ABEYLTNHTESOJ3D5QW7WA63WJC7MXANCNFSM6AAAAAASFI3UCE]. You are receiving this because you authored the thread.[Tracking image][https://github.com/notifications/beacon/ABEYLTLRGTXSHAM6PVHNPMTWJC7MXA5CNFSM6AAAAAASFI3UCGWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSOXLTTG.gif]

tsmethurst commented 1 year ago

Do we cleanup old emoji at the moment? I didn't think we did

No we don't... I'm also not 100% convinced our avi/header cleanup is working properly tbh

I could see> 30,000 items in media_attachments where remote_url was NULL

maybe we're not setting remote_url properly on some things :thinking:

gw1urf commented 1 year ago

Sorry, NOT Null, my mistake

19 Nov 2022 12:55:43 tobi @.***>:

Do we cleanup old emoji at the moment? I didn't think we did

No we don't... I'm also not 100% convinced our avi/header cleanup is working properly tbh

I could see> 30,000 items in media_attachments where remote_url was NULL

maybe we're not setting remote_url properly on some things 🤔

— Reply to this email directly, view it on GitHub[https://github.com/superseriousbusiness/gotosocial/issues/1079#issuecomment-1320880071], or unsubscribe[https://github.com/notifications/unsubscribe-auth/ABEYLTPAZH2US2N2F4EZ3LTWJDE4XANCNFSM6AAAAAASFI3UCE]. You are receiving this because you authored the thread.[Tracking image][https://github.com/notifications/beacon/ABEYLTPMEOINM4RWM7ODZULWJDE4XA5CNFSM6AAAAAASFI3UCGWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSOXMD4O.gif]

tsmethurst commented 1 year ago

Ah okay thanks for the clarification :D Alright, seems we got some stuff to look into

gw1urf commented 1 year ago

Back at a proper keyboard now. GitHub really doesn't want me to attach a perl script, so here's what I'm doing to clean up. I'd imagine something very like this is what GoToSocial is trying to do internally.

#!/usr/bin/perl
use DBI;

my $home = "/home/social/gotosocial";

$db = DBI->connect("DBI:SQLite:dbname=$home/sqlite.db", "", "");
$db->begin_work;

my $q = $db->prepare(q{
        SELECT id, file_path, thumbnail_path, file_updated_at
        FROM media_attachments
        WHERE file_updated_at < date('now', '-3 days')
        AND remote_url IS NOT NULL
        AND cached=1
});
$q->execute();

$uncache = $db->prepare(q{
        UPDATE media_attachments
        SET cached=0
        WHERE id = ?
});
while (defined(my $row = $q->fetchrow_hashref))
{
        if (-e "$home/storage/$row->{file_path}")
        {
                unlink("$home/storage/$row->{file_path}");
        }
        if (-e "$home/storage/$row->{thumbnail_path}")
        {
                unlink("$home/storage/$row->{thumbnail_path}");
        }
        $uncache->execute($row->{id});
}
$db->commit;
gw1urf commented 1 year ago

Sorry for spamming on this one. I realised I could examine the database from a recent backup (specifically 2022-11-15 04:39:47 - yes, the media filled my backup drive too!).

Looking at the pruning code, I can see you effectively do where cache=1 and avatar=0 and header=0 and created_at < ... and remote_url is not null. So I did:

select sum(file_file_size + thumbnail_file_size) from media_attachments where remote_url is not null and cached=1 and avatar=0 and header=0 and created_at < date('2022-11-15 04:39:47.390327+00:00', '-3 days');

That gave just 2MBytes. So pruning is working correctly as specified by the function. The problem appears to be that I have a vast amount of stuff that's exempted from the pruning.

select sum(file_file_size + thumbnail_file_size) from media_attachments where remote_url is not null and cached=1 and created_at < date('2022-11-15 04:39:47.390327+00:00', '-3 days'); gives 2.3GBytes. By the time my VPS filled, this had ballooned to > 5GBytes.

Breaking the 2.3GBytes down, it appears that I had 1.5GBytes of cached headers (7308 items) and 0.8GBytes of cached avatars (5131 items).

Perhaps what's really going on here is that the recent influx of new folks is causing many more headers & avatars to be seen than previously, and the cache pruning policy needs to change as a result.

From my manual pruning, it looks like avatars and headers can be pruned and will be retrieved again if needed. So maybe it's enough to remove the checks on avatar/header from GetRemoteOlderThan. Or, perhaps a more sophisticated approach - record "last accessed" too and prune on that basis - would allow headers and avatars to stay for people you interact with frequently while allowing stuff from one-off boosts to be pruned.

zladuric commented 1 year ago

I don't have much to add here, but I do have a question: is there an option to not cache remote media at all?

If not, I think one would be awesome, or some sort of lazily-fetched remote media mechanism.

I've noticed the same too, but I thought it was just some stale records from before. I also made a tiny script like your perl one but this is really both unexpected that I'll be hosting other people's media, even though they have their own and undesirable. I moved from mastodon.technology to my gts instance just before the birdsite exodus recently, and until I cleaned up my timeline, I would have like 500 status updates a day, 50 of them from the people I actually follow. I disabled boosts in frontend, but my instance is still caching everything, even though I possbily won't see many of it.

tsmethurst commented 1 year ago

I moved from mastodon.technology to my gts instance just before the birdsite exodus recently, and until I cleaned up my timeline, I would have like 500 status updates a day, 50 of them from the people I actually follow.

This is a different issue you're talking about here I think; aren't you talking about uncacheing statuses (so, status db entries) rather than media?

But yes, the ballooning storage size is something we're going to address as soon as we've got time. So, soon :)

zladuric commented 1 year ago

No, I'm taking about media. Statuses are in the database and I don't mind a hundred or so MB pet month (for now). But media is problematic, as the issue shows, it goes into gigabytes and the cleanup action doesn't seem to do the job.

tsmethurst commented 1 year ago

closed by https://github.com/superseriousbusiness/gotosocial/pull/1234 (just tested and it brought my media storage from 14gb down to 3gb), but we have some other work to do before we can include that PR in a release, so hang tight

gw1urf commented 1 year ago

That's great. I'll pull the update and see what happens.

gw1urf commented 1 year ago

Possible issue (I've not checked). I had "media-remote-cache-days: 0" in my config,yaml while the bug was ongoing. Still, on startup, the usage of my storage directory dropped from 1103M to 205M.

Of course, it's possible my manual prodding has caused some inconsistency that triggered this cleanup, but I thought I'd mention, just in case "0" is actually acting as "delete all over 0 days old" rather than "don't clean up the cache".

Sorry if it's a false alarm, and thanks for looking at this.

mirabilos commented 1 year ago

I’m currently running (via /admin/actions) a daily manual prune with days=0 because it didn’t let me do 0.5 and using 1 doesn’t delete nearly enough… I’m about to run out of disc space as well (thankfully I have a separate vhdd for the media, but I’d like to avoid having to extend it…)

NyaaaWhatsUpDoc commented 1 year ago

If you fancy running main you should be able to clear up much of that data :)