timgit / pg-boss

Queueing jobs in Postgres from Node.js like a boss
MIT License
2.04k stars 157 forks source link

Minimizing disk usage? #220

Closed gc closed 3 years ago

gc commented 3 years ago

I noticed that pgboss.job table comprises of 50% of my DB's disk usage (pgboss.archive table is..less than 5%), when I expect it should be huuugely less, from my understanding of pg-boss, its because it retains completed jobs for 30 days, then its archived for 7 days, then its deleted.

-       this.boss = new PgBoss({ ...providerConfig?.postgres });
+       this.boss = new PgBoss({ ...providerConfig?.postgres, deleteAfterHours: 1 });
-       await this.client.boss.schedule('analytics', `*/20 * * * *`);
+       await this.client.boss.schedule('analytics', `*/20 * * * *`, undefined, {
+           retentionMinutes: 1
+       });
    return client.boss.publishAfter(
        'minionActivity',
        newSubtask,
-              {},
+       { retentionHours: 1 },
        new Date(finishDate)
    );

my questions:

  1. Here are my changes (diff above) that I'm doing now to reduce this, is that right to reduce the amount of data stored in pgboss.job table? If not how can I reduce the size of pgboss.job table?
  2. With my existing data in the database, does this query look safe to purge all the unneeded data? DELETE FROM pgboss.job WHERE data->>'state' = 'completed';

Thank you.

timgit commented 3 years ago

The default archive setting will move completed jobs to the archive after 12 hours.

The retention* settings apply to jobs in created or retry state. The reason you're seeing this remove jobs is most likely because it's archiving completion jobs (prefixed with __state__completed). The deleteAfter* settings only apply to the archive table.

Keep in mind the smaller you set the retention setting, the higher the risk of losing jobs if your queue size grows larger than your subscribers are able to process them. Having said that, these settings are there to be used according to your use cases. :)

gc commented 3 years ago

I'm left quite confused, I don't believe you answered either of my questions. I'll try to phrase it differently. Out of my top 12 biggest relations (in terms of disk usage), 9 of them are from pgboss, however I'm simply using pgboss to run stuff on an interval, and at certain timestamps, I expect pgboss to be using atleast 95% less than it is now.

I just want to know how to make pgboss use less space. I have not deployed (am not using) the above changes, because I was trying to first ask you if those changes are right, the config and their purposes are highly confusing, atleast to me.

For reference, heres a table of my disk usage:

             relation             |  size
----------------------------------+---------
 pgboss.job                       | 981 MB
 pg_toast.pg_toast_60027005       | 980 MB
 public.users                     | 450 MB
 pgboss.archive                   | 197 MB
 pgboss.job_singletonon           | 105 MB
 pgboss.job_singletonkeyon        | 97 MB
 pgboss.job_singletonkey          | 94 MB
 pgboss.job_name                  | 90 MB
 pg_toast.pg_toast_60027005_index | 76 MB
 pgboss.job_pkey                  | 56 MB
 pgboss.archive_id_idx            | 23 MB
 pgboss.archive_archivedon_idx    | 12 MB
gc commented 3 years ago

Here's some more info incase its helpful.

Query 1:

SELECT name, COUNT(name) AS qty
FROM pgboss.job
GROUP BY name

Outputs

                   name                    |   qty
-------------------------------------------+---------
 analytics                                 |      36
 minionActivity                            |   22220
 __pgboss__cron                            |     722
 __pgboss__maintenance                     |     359
 __pgboss__send-it                         |      36
 __state__completed__analytics             |    4303
 __state__completed__minionActivity        | 1219943
 __state__completed____pgboss__cron        |       3
 __state__completed____pgboss__maintenance |       8
 __state__completed____pgboss__send-it     |    4303

Query 2:

SELECT name, keepuntil, createdon, state FROM pgboss.job ORDER by keepuntil DESC LIMIT 10;
                name                |         keepuntil          |           createdon           |  state
------------------------------------+----------------------------+-------------------------------+---------
 __state__completed__minionActivity | 2021-03-29 16:38:36.037+02 | 2021-01-28 16:38:37.635226+01 | created
 __state__completed__minionActivity | 2021-03-29 16:38:34.872+02 | 2021-01-28 16:38:35.468883+01 | created
 __state__completed__minionActivity | 2021-03-29 16:38:33.626+02 | 2021-01-28 16:38:35.590336+01 | created
 __state__completed__minionActivity | 2021-03-29 16:38:31.781+02 | 2021-01-28 16:38:33.394589+01 | created
 __state__completed__minionActivity | 2021-03-29 16:38:29.695+02 | 2021-01-28 16:38:31.250583+01 | created
 __state__completed__minionActivity | 2021-03-29 16:38:28.803+02 | 2021-01-28 16:38:29.181241+01 | created
 __state__completed__minionActivity | 2021-03-29 16:38:27.192+02 | 2021-01-28 16:38:29.194338+01 | created
 __state__completed__minionActivity | 2021-03-29 16:38:24.258+02 | 2021-01-28 16:38:25.107403+01 | created
 __state__completed__minionActivity | 2021-03-29 16:38:23.673+02 | 2021-01-28 16:38:25.103182+01 | created
 __state__completed__minionActivity | 2021-03-29 16:38:20.371+02 | 2021-01-28 16:38:21.009189+01 | created

I'm confused because the default should be 12hrs, then it archives, but these were created on 01-28 and have keepuntil as +2 months from that date? (It's also confusing that they're marked as being in 2 states at once? completed and created?)

This is all using all the defaults. I haven't supplied anything but the bare minimum:

    const boss = new PgBoss({ ...providerConfig?.postgres });
    boss.publishAfter(
        'minionActivity',
        newSubtask,
        {},
        new Date(finishDate)
    );

I'm trying hard to understand, apologies for my trouble.

timgit commented 3 years ago

Thanks for providing your sample queues and sizes.

If your goal is to have the completion jobs archived sooner, yes, you should set a shorter retention on the original job which is inherited by the completion job when it's automatically created.

The completion jobs are only used if you use fetchCompleted() or onComplete(). Currently I don't have an option created to bypass creating these, but it has been requested before so I'm thinking about adding it.

One more thing about table sizes in postgres. If you were to add a really large amount of json in a job, and that job were retried multiple times, mvcc in postgres would internally store a copy of the original record per update and your table size would significantly larger than the visible rows until auto vacuum or other process is run to clean up dead rows. Just something to consider if you're needing to pay close attention to db size.

gc commented 3 years ago
DELETE FROM pgboss.job WHERE state = 'completed' OR data->>'state' = 'failed';
DELETE FROM pgboss.archive WHERE state = 'completed' OR state = 'failed' OR data->>'state' = 'failed' OR name LIKE '__state__completed%';

These are the queries I ended up running to remove useless/old data, I went ahead with the changes mentioned in my original post which I hope will keep the tables smaller and store less data, I don't know how long it'll take before I know if its been effective so I'll just close the issue now. Thanks

bhsiaoNFLX commented 3 years ago

@gc just curious (because I'm running into similar issue) have you possibly considered using deleteQueue(name) and/or deleteAllQueues() to remove those internal rows? If you have, have you encountered any difficulties (asking bc I'm about to dive in next week so heads up on gotchas would help). Using these won't actually help with disk usage but it keeps job table size small.

gc commented 3 years ago

@bhsiaoNFLX I'm not sure if those 2 functions are for removing completed jobs? I didn't use them, all I did was: manually nuked the database table with above queries, then ran vacuum full on the pgboss schema, and added in the code changes in the OP to hopefully prevent pgboss from keeping around old data.