wereii / lemmy-thumbnail-cleaner

MIT License
11 stars 1 forks source link

[!CAUTION] 🛑 DO NOT USE LTC - There is unresolved security issue, see https://github.com/wereii/lemmy-thumbnail-cleaner/issues/10

Lemmy Thumbnail Cleaner

This is a simple program to remove old thumbnails from pict-rs and lemmy.

It will periodically check the lemmy database for posts that are older than given amount of months and instruct pict-rs to drop the thumbnail for that post.

Usage

This program requires connection to the lemmy postgres database and pict-rs HTTP service. The expected deployment is as container/service alongside the pict-rs and lemmy postgres services.

Edit the lemmy docker-compose.yml to include this service:

services:

  # ....

  cleaner:
    image: ghcr.io/wereii/lemmy-thumbnail-cleaner:v0.1.3
    #restart: unless-stopped
    environment:
      - RUST_LOG=info
      - INSTANCE_HOST=https://your_instance_host.here/
      - POSTGRES_DSN=postgresql://user:password@postgres/lemmy
      - PICTRS_HOST=pict-rs:8080
      - PICTRS_API_KEY=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
      #- THUMBNAIL_MIN_AGE_MONTHS=3
      #- CHECK_INTERVAL=300
      #- QUERY_LIMIT=100

⚠️ Only pict-rs 0.5+ can be used, older versions do not implement required API endpoints! ⚠️

Pict-rs also needs to be configured with api key (PICTRS__SERVER__API_KEY), otherwise the endpoint required for this cleaner is not accessible!

Configuration

Required Environment Variables

Optional Environment Variables

The CHECK_INTERVAL and QUERY_LIMIT is what controls how demanding the cleaner is on the database and pict-rs. You should tweak it to fit the performance of your infrastructure.

When there is a lot (10k+) that can be cleaned up you should reduce the CHECK_INTERVAL (5-15s) and then increase QUERY_LIMIT (~500) to speed up the process. Keep in mind the program is intentionally single-threaded so increasing QUERY_LIMIT too much will keep the program continually hitting both pict-rs and postgres for longer.

Once there is less (hundreds) you can increase the CHECK_INTERVAL to hours or days as there won't be that much new thumbnails old enough (but that depends on your traffic).
I would personally expect this to run once or twice a day at that point, with query limit of around the 300.

Notes

Backblaze B2

When the bucket lifecycle is configured to Keep Only Last Version, the old versions are not deleted immediately but hidden instead and deleted after 24h.
So don't be surprised if the bucket size doesn't change immediately.

Results:

Disclaimer

My rust is rusty so there might be some issues with the code.
I have tested this on my own instance and it works as expected but

USE AT YOUR OWN RISK