wereii / lemmy-thumbnail-cleaner

MIT License
11 stars 1 forks source link

Cleanup all unused images #3

Open Nutomic opened 6 months ago

Nutomic commented 6 months ago

Its easy to have unused images on Lemmy, eg when uploading an image and then not actually using it in the new post. Or when a comment gets deleted, embedded images will be unused. Its very tricky to clean this up from within Lemmy, as images can be referenced by many different tables, and in many cases they are embedded in markdown.

However it should be very easy to clean them up as follows:

Neriderc commented 6 months ago

This would be amazing!

I'd say you'd need a little buffer. It's possible someone uploads an image to a comment or post, but hasn't submitted it yet. You'd probably want to only delete images that were created say 24 hours or more before the lemmy/pictrs database dumps.

wereii commented 5 months ago

Considering the db dump requirement this is pretty involved.

Also having to scan through the gigabytes of the dump data (my little instance currently dumps about 5G uncompressed data) is probably not going to be efficient unless specifically optimized. Maybe scanning the dump once, extracting anything that looks like URL of a media file (maybe even matching the instance host) then inserting it into unlogged table in postgres and only after that going through each pict-rs link and checking for existence could help?

From my POV this will have to be a separate tool, at least because of the dump requirement, I also don't like the idea of a tool calling pg_dumpall on my db, would rather do that part manually and run the tool on top of it.

Nutomic commented 5 months ago

The other alternative is to write an sql query which checks all the fields where images can be linked. So all markdown fields (post, comment, sidebars, user bio, private messages etc) as well as avatars, banners etc. This is definitely the cleaner solution but more effort to implement, whereas db dump is quick and dirty.