throttle progress - Githubissues

worden341 commented 7 years ago

I generally can't use your otherwise nifty program because usually the tables I need to repack are large, and the program causes a surge of disk I/O that unacceptably impacts system responsiveness. If you had an option to somehow throttle the rate of progress then I could likely use it.

jwg2s commented 5 years ago

@worden341 did you have any success finding a way to determine progress of the repack process? Sometimes you're just sitting and waiting for hours, and you have no idea how long the entire process might take (hours vs days for example)

mikkorantalainen commented 5 years ago

Could you just ionice -c 3 the repack process? Of course, that obviously requires a good kernel and IO scheduler that actually supports IO priorities.

worden341 commented 4 years ago

No that does not work because the actual IO and cpu and everything else is actually done by PostgreSQL, not pg_repack.

gabegorelick commented 4 years ago

It would be nice (no pun intended) if pg_repack could throttle itself when copying tuples, e.g. by splitting up the INSERT into multiple chunks.

You can try throttling index creation with work_mem/maintenance_work_mem, but that may actually be worse for I/O as more has to be spilled to disk.

https://wiki.postgresql.org/wiki/Priorities has more info.

rshadow commented 2 years ago

I have the same problem. So the tool is not ready for production if you have large tables and a high load. It rests on the IO limit and affects the operation of the database.

MichaelDBA commented 2 years ago

I don't see what the fuss is about? There is significant load generated with regular VACUUM FULLs. What's the big diff?

fuyar commented 1 year ago

With gh-ost for MySQL you can throttle to be more gentle on I/Os and WAL generation (+replication).

It would be nice if pg_repack added this is that every one is saying I think.

DavidGzzMilan commented 1 year ago

The performance benefits of removing the bloat from tables and indexes are evident; however, there are some projects where people prefer living with the bloat instead of trying to repack because of the IO demand. Having some options to tweak it and control the progress definitely will be a great add-on to this wonderful tool.

kustodian commented 1 month ago

Being able to set some kind of throttling would be awesome, especially in cases of big tables that take hours to repack because they can generate a lot of IO that would make the whole DB extremely slow.

andreasscherbaum commented 1 month ago

There is significant load generated with regular VACUUM FULLs.

This can be slowed down, using a variety of parameters. By default these settings are off, and VACUUM runs at full speed.

zakeryooo commented 3 weeks ago

Does anyone know any sort of workaround for this (even a hacky one)? I run pg_repack on a table once every 8 weeks to recluster by a specific index, it keeps everything running perfect except for the 60 minutes while the indexes are being rebuilt during which time the entire DB it about 10-25x slower due to the create index saturating all I/O.

In this instance it doesn't matter how long the pg_repack takes, if I could make the create index only consume 10% of I/O and it take 10x longer that would be infinitely preferable.

@jwg2s @rshadow did you ever find a suitable workaround for this? Thanks!

Edit: As a temporary workaround for this while the pg_repack is running I have a background process periodically querying pg_stat_activity for the postgresql pid(s) related to the pg_repack queries and putting them in a IO limited cgroup. Keeps all my other queries running near full speed while the repack runs slowly in the background. Not sure if this is a bad idea or not but it seems to be working for my use case 😄

worden341 commented 2 weeks ago

Due to the way pg_repack is designed, this feature will probably never be implemented. I'm not knocking it: pg_repack is good in its own way. Since my original comment in 2017, I implemented my own repack solution and used it many times, and you can too, but you need some smarts about Postgres administration and sql in order to pull it off without badly breaking your data (test, test, test). This process is essentially the same as what pg_repack does except for the batching and sleeping parts. The very short version is (I may be missing steps because I've been out of this area for a while):

Create a new empty table with a schema identical to the bloated table. At this point, you may omit indexes unneeded by the primary key.
Create triggers on the bloated table to mirror all inserts/updates/deletes on the new table.
Create and run a script that walks through the bloated table and batch copies rows into the new table. Commit after each batch, and sleep between batches.
Build any missing indexes on the new table.
In a final transaction, swap the names of the old and new tables. You may wish to also temporarily create a trigger on the new table to mirror data changes back to the old table until you are sure that everything went correctly.
Drop the old table.

reorg / pg_repack

throttle progress #154