wereii / lemmy-thumbnail-cleaner

MIT License
11 stars 1 forks source link

[Question] thumbnail-cleaner deleting files too early #8

Closed b2cc closed 3 months ago

b2cc commented 3 months ago

Hello!

We've recently (last week) set up a new Lemmy instance and are already seeing some activity, so we installed thumbnail-cleaner and configured it to clean up files that are older than 2 months. However we're seeing constant activity already on files that can't be older than a couple of days.

Our expectation would have been that the cleaner would kick in earliest in several weeks in the future. Am I understanding something wrong, how this tool works? Did we configure it wrong? Thanks for your support!

Version: v0.1.3

PS: we're on the latest release, so it's not the v0.1.1 bug.

ENV config

        - name: RUST_LOG
          value: info
        - name: INSTANCE_HOST
          value: https://lemmy.example.com
        - name: POSTGRES_DSN
          value: postgres://<DSN>
        - name: PICTRS_HOST
          value: lemmy-pictrs:8080
        - name: PICTRS_API_KEY
          value: <APIKEY>
        - name: THUMBNAIL_MIN_AGE_MONTHS
          value: "2"
        - name: CHECK_INTERVAL
          value: "3600"
        - name: QUERY_LIMIT
          value: "5000"

logs:

[2024-06-25T23:21:54Z INFO  lemmy_thumbnail_cleaner] Finished iteration, processed 543 thumbnails
[2024-06-25T23:21:54Z INFO  lemmy_thumbnail_cleaner] Sleeping for 3600s
[2024-06-26T00:21:54Z INFO  lemmy_thumbnail_cleaner] Checking for thumbnails to clean
[2024-06-26T00:21:54Z INFO  lemmy_thumbnail_cleaner] Database contains 29 of thumbnails that can be cleaned up
[2024-06-26T00:21:55Z INFO  lemmy_thumbnail_cleaner] Processed 10 thumbnails
[2024-06-26T00:21:56Z INFO  lemmy_thumbnail_cleaner] Processed 20 thumbnails
[2024-06-26T00:21:57Z INFO  lemmy_thumbnail_cleaner] Finished iteration, processed 29 thumbnails
[2024-06-26T00:21:57Z INFO  lemmy_thumbnail_cleaner] Sleeping for 3600s
[2024-06-26T01:21:57Z INFO  lemmy_thumbnail_cleaner] Checking for thumbnails to clean
[2024-06-26T01:21:57Z INFO  lemmy_thumbnail_cleaner] Database contains 20 of thumbnails that can be cleaned up
[2024-06-26T01:21:58Z INFO  lemmy_thumbnail_cleaner] Processed 10 thumbnails
[2024-06-26T01:21:58Z INFO  lemmy_thumbnail_cleaner] Processed 20 thumbnails
[2024-06-26T01:21:58Z INFO  lemmy_thumbnail_cleaner] Finished iteration, processed 20 thumbnails
[2024-06-26T01:21:58Z INFO  lemmy_thumbnail_cleaner] Sleeping for 3600s
[2024-06-26T02:21:58Z INFO  lemmy_thumbnail_cleaner] Checking for thumbnails to clean
[2024-06-26T02:21:58Z INFO  lemmy_thumbnail_cleaner] Database contains 66 of thumbnails that can be cleaned up
[2024-06-26T02:21:59Z INFO  lemmy_thumbnail_cleaner] Processed 10 thumbnails
[2024-06-26T02:22:00Z INFO  lemmy_thumbnail_cleaner] Processed 20 thumbnails
...
[2024-06-26T02:22:03Z INFO  lemmy_thumbnail_cleaner] Processed 60 thumbnails
[2024-06-26T02:22:04Z INFO  lemmy_thumbnail_cleaner] Finished iteration, processed 66 thumbnails
[2024-06-26T02:22:04Z INFO  lemmy_thumbnail_cleaner] Sleeping for 3600s
[2024-06-26T03:22:04Z INFO  lemmy_thumbnail_cleaner] Checking for thumbnails to clean
[2024-06-26T03:22:04Z INFO  lemmy_thumbnail_cleaner] Database contains 101 of thumbnails that can be cleaned up
[2024-06-26T03:22:05Z INFO  lemmy_thumbnail_cleaner] Processed 10 thumbnails
...
[2024-06-26T03:22:14Z INFO  lemmy_thumbnail_cleaner] Processed 100 thumbnails
[2024-06-26T03:22:14Z INFO  lemmy_thumbnail_cleaner] Finished iteration, processed 101 thumbnails
[2024-06-26T03:22:14Z INFO  lemmy_thumbnail_cleaner] Sleeping for 3600s
[2024-06-26T04:22:14Z INFO  lemmy_thumbnail_cleaner] Checking for thumbnails to clean
[2024-06-26T04:22:14Z INFO  lemmy_thumbnail_cleaner] Database contains 22 of thumbnails that can be cleaned up
[2024-06-26T04:22:14Z INFO  lemmy_thumbnail_cleaner] Processed 10 thumbnails
[2024-06-26T04:22:15Z INFO  lemmy_thumbnail_cleaner] Processed 20 thumbnails
[2024-06-26T04:22:15Z INFO  lemmy_thumbnail_cleaner] Finished iteration, processed 22 thumbnails
[2024-06-26T04:22:15Z INFO  lemmy_thumbnail_cleaner] Sleeping for 3600s
[2024-06-26T05:22:15Z INFO  lemmy_thumbnail_cleaner] Checking for thumbnails to clean
[2024-06-26T05:22:15Z INFO  lemmy_thumbnail_cleaner] Database contains 97 of thumbnails that can be cleaned up
[2024-06-26T05:22:17Z INFO  lemmy_thumbnail_cleaner] Processed 10 thumbnails
...
[2024-06-26T05:22:25Z INFO  lemmy_thumbnail_cleaner] Processed 90 thumbnails
[2024-06-26T05:22:25Z INFO  lemmy_thumbnail_cleaner] Finished iteration, processed 97 thumbnails
wereii commented 3 months ago

Interesting, just to get the whole picture, could you give me your pict-rs, lemmy and postgres versions?

wereii commented 3 months ago

Either way, to debug this I will also need some results from postgres.

1) show timezone;

And two separate results from this query, one as soon as you can and the other about an hour after it.

2) Replace <INSTANCE_HOST_HERE> with your https://lemmy.example.com

    SELECT published FROM post WHERE thumbnail_url IS NOT NULL AND published < now() - interval '2 months' AND thumbnail_url LIKE '<INSTANCE_HOST_HERE>%' ORDER BY published ASC LIMIT 100;

E: Also disable LTC for now if you haven't done so yet.

b2cc commented 3 months ago

Hello @wereii !

Thanks for looking into this. I have shut down TNC for now as advised.

Here is what you asked for:

Setup is running on kubernetes, and I have two pict-rs containers running for performance/redundancy. Hopefully this doesn't create any issues?

Regarding the timezone: I have set up TZ=Europe/Vienna in all pods.

Database:

=# show timezone;
   TimeZone    
───────────────
 Europe/Vienna

Query returns weird results: there are dates from like a year ago in the list, but the instance is running since end of last week.

 =# SELECT published FROM post WHERE thumbnail_url IS NOT NULL AND published < now() - interval '2 month' AND thumbnail_url LIKE 'https://<our-lemmy.tld>/%'ORDER BY published ASC LIMIT 100;
           published           
───────────────────────────────
 2023-06-14 10:02:37.522987+02
 2023-06-23 13:02:18.592631+02
 2023-06-24 20:50:34.757857+02
 2023-06-28 02:53:20.395242+02
 2023-06-28 20:15:18.867093+02
 2023-06-30 02:18:36.870403+02
 2023-06-30 03:01:17.343873+02
 2023-07-01 04:59:14.649974+02
 2023-07-01 05:22:11.507463+02
 2023-07-01 05:24:20.866023+02
 2023-07-01 05:29:02.775965+02
 2023-07-01 05:47:25.351497+02
 2023-07-01 10:54:15.661204+02
 2023-07-02 18:09:24.024546+02
 2023-07-05 08:39:44.653938+02
 2023-07-05 08:41:08.289215+02
 2023-07-05 08:43:20.15052+02
 2023-07-05 08:44:27.980647+02
 2023-07-05 08:45:25.54621+02
 2023-07-08 21:35:40.807414+02
 2023-07-17 04:38:31.590049+02
 2023-07-20 14:17:08.471337+02
 2023-08-03 00:04:23.788153+02
 2023-08-26 03:00:50.143148+02
 2023-08-26 19:53:45.128265+02
 2023-09-07 04:44:20.961655+02
 2023-09-22 00:10:41.155795+02
 2023-09-26 21:57:58.034775+02
 2023-09-27 08:38:41.075973+02
 2023-09-28 01:02:17.125824+02
 2023-10-14 03:13:23.02847+02
 2023-10-16 22:24:50.401961+02
 2023-10-17 07:39:27.219351+02
 2023-10-20 22:16:28.599477+02
 2023-10-28 16:51:03.697838+02
 2023-10-30 07:37:18.613206+01
 2023-10-30 21:25:48.906404+01
 2023-11-07 15:34:09.254076+01
 2023-11-12 00:25:05.104398+01
 2023-11-14 03:23:59.094029+01
 2023-11-14 07:26:43.376206+01
 2023-11-16 06:39:10.656277+01
 2023-11-16 21:12:39.398149+01
 2023-11-23 22:19:34.619158+01
 2023-11-24 16:09:50.688052+01
 2023-11-26 04:02:32.257513+01
 2023-11-30 12:24:13.536012+01
 2023-12-01 12:31:43.974123+01
 2023-12-01 21:58:40.470562+01
 2023-12-02 16:41:14.485255+01
 2023-12-03 10:41:01.584297+01
 2023-12-04 14:25:13.159192+01
 2023-12-05 08:20:51.887777+01
 2023-12-05 08:39:35.611595+01
 2023-12-05 16:49:41.364756+01
 2023-12-29 19:05:13.030427+01
 2024-01-04 18:51:56.448098+01
 2024-01-05 19:02:41.506983+01
 2024-01-09 21:37:59.743341+01
 2024-01-09 22:37:36.704221+01
 2024-01-11 18:06:09.086082+01
 2024-01-16 12:05:47.908487+01
 2024-01-18 15:48:42.746499+01
 2024-01-21 12:11:08.780344+01
 2024-01-22 17:35:21.024329+01
 2024-01-23 05:46:39.936204+01
 2024-01-31 19:25:18.019636+01
 2024-02-08 00:30:50.306612+01
 2024-02-10 02:56:32.186032+01
 2024-02-10 10:17:04.500949+01
 2024-02-16 16:34:54.25185+01
 2024-02-26 08:58:16.899321+01
 2024-02-27 04:22:19.063257+01
 2024-03-03 18:36:07+01
 2024-03-05 15:20:47.268325+01
 2024-03-05 18:04:43.577068+01
 2024-03-06 04:13:01.431452+01
 2024-03-12 20:26:49.107982+01
 2024-03-14 21:52:19.688288+01
 2024-03-18 02:46:34.836232+01
 2024-03-19 03:07:55.196269+01
 2024-03-20 07:36:57.804687+01
 2024-03-20 07:50:58.52845+01
 2024-03-21 03:08:28.008224+01
 2024-03-22 03:46:08.205389+01
 2024-03-24 15:21:35.241756+01
 2024-03-25 03:05:49.937189+01
 2024-03-26 03:08:39.286127+01
 2024-03-26 08:33:49.020654+01
 2024-03-26 14:03:04.147933+01
 2024-03-28 03:41:55.205593+01
 2024-03-30 02:15:50.068709+01
 2024-03-31 12:20:20.347872+02
 2024-04-01 02:22:49.969626+02
 2024-04-01 03:10:21.610331+02
 2024-04-01 03:26:44.855671+02
 2024-04-02 04:08:02.950132+02
 2024-04-03 03:53:33.397033+02
 2024-04-03 15:25:44.10959+02
 2024-04-04 15:35:56.257809+02
(100 rows)
Nothing4You commented 3 months ago

these query results do make sense, as this tool doesn't use the time an image was uploaded to pict-rs but the time the post was created.

even if you have a fresh instance there are still various conditions that can pull in old posts. some examples include

b2cc commented 3 months ago

@wereii : Ok thanks for looking into this, I wasn't exactly aware how the tool works. Thanks for explaining! I'll close this issue then, seems everything works as expected.

wereii commented 3 months ago

Thanks for the explanation @Nothing4You, the query output indeed does make sense

One more thing though, are you running some kind of subscribing bot @b2cc ? with query_limit this high and only hour between each run I would expect the first run to delete most of the thumbnails of the old posts and the rest to show 0 clean-able thumbnails.
It could be a bot or someone subscribing communities with old posts (or force fetching old posts through the search bar trick).

b2cc commented 3 months ago

@wereii : Bot's not so much (but can't say for sure) but we have quite some users moving from another instance, and they might be importing their settings/subscriptions etc. Maybe that's the cause?

wereii commented 3 months ago

yes that could be the reason, it should eventually get a bit more spiky (positive numbers deleted followed by zeros, repeat)