Open worden341 opened 7 years ago
@worden341 did you have any success finding a way to determine progress of the repack process? Sometimes you're just sitting and waiting for hours, and you have no idea how long the entire process might take (hours vs days for example)
Could you just ionice -c 3 the repack process? Of course, that obviously requires a good kernel and IO scheduler that actually supports IO priorities.
No that does not work because the actual IO and cpu and everything else is actually done by PostgreSQL, not pg_repack.
It would be nice (no pun intended) if pg_repack could throttle itself when copying tuples, e.g. by splitting up the INSERT into multiple chunks.
You can try throttling index creation with work_mem
/maintenance_work_mem
, but that may actually be worse for I/O as more has to be spilled to disk.
https://wiki.postgresql.org/wiki/Priorities has more info.
I have the same problem. So the tool is not ready for production if you have large tables and a high load. It rests on the IO limit and affects the operation of the database.
I don't see what the fuss is about? There is significant load generated with regular VACUUM FULLs. What's the big diff?
With gh-ost for MySQL you can throttle to be more gentle on I/Os and WAL generation (+replication).
It would be nice if pg_repack added this is that every one is saying I think.
The performance benefits of removing the bloat from tables and indexes are evident; however, there are some projects where people prefer living with the bloat instead of trying to repack because of the IO demand. Having some options to tweak it and control the progress definitely will be a great add-on to this wonderful tool.
Being able to set some kind of throttling would be awesome, especially in cases of big tables that take hours to repack because they can generate a lot of IO that would make the whole DB extremely slow.
There is significant load generated with regular VACUUM FULLs.
This can be slowed down, using a variety of parameters. By default these settings are off, and VACUUM
runs at full speed.
Does anyone know any sort of workaround for this (even a hacky one)? I run pg_repack
on a table once every 8 weeks to recluster by a specific index, it keeps everything running perfect except for the 60 minutes while the indexes are being rebuilt during which time the entire DB it about 10-25x slower due to the create index
saturating all I/O.
In this instance it doesn't matter how long the pg_repack takes, if I could make the create index
only consume 10% of I/O and it take 10x longer that would be infinitely preferable.
@jwg2s @rshadow did you ever find a suitable workaround for this? Thanks!
Edit: As a temporary workaround for this while the pg_repack
is running I have a background process periodically querying pg_stat_activity
for the postgresql pid(s) related to the pg_repack
queries and putting them in a IO limited cgroup. Keeps all my other queries running near full speed while the repack runs slowly in the background. Not sure if this is a bad idea or not but it seems to be working for my use case 😄
Due to the way pg_repack is designed, this feature will probably never be implemented. I'm not knocking it: pg_repack is good in its own way. Since my original comment in 2017, I implemented my own repack solution and used it many times, and you can too, but you need some smarts about Postgres administration and sql in order to pull it off without badly breaking your data (test, test, test). This process is essentially the same as what pg_repack does except for the batching and sleeping parts. The very short version is (I may be missing steps because I've been out of this area for a while):
I generally can't use your otherwise nifty program because usually the tables I need to repack are large, and the program causes a surge of disk I/O that unacceptably impacts system responsiveness. If you had an option to somehow throttle the rate of progress then I could likely use it.