reorg / pg_repack

Reorganize tables in PostgreSQL databases with minimal locks
BSD 3-Clause "New" or "Revised" License
1.89k stars 175 forks source link

Strategy for busy tables #8

Open armanddp opened 11 years ago

armanddp commented 11 years ago

Hey guys,

I've let pg_repack run on one of our busy tables. It's finished the rewriting of the table however the flushing from what I can see is falling behind way too far to catch up. If I understand how it works correctly:

SELECT max(id) FROM repack.log_ is going up by ~ 1000 second while the min(id) is only climbing every 2-15 seconds when the repack.repack_apply operations finish.

Seem to be heading for problems here.

dvarrazzo commented 11 years ago

Is this a regression w.r.t. pg_reorg?

armanddp commented 11 years ago

I've never used pg_reorg, this is 1.1.8 pg_repack from the repo. (Sorry know this is probably not really an issue issue but don't have access to the mailing list)

armanddp commented 11 years ago

Just secondary to this as it's happening to me at the moment. Can I safely DROP EXTENSION pg_repack CASCADE while it's running to shut it down?

schmiddy commented 11 years ago

That is strange. Generally we expect the repack temp table to be able to keep up with even heavy traffic on the main table, if only because we have a compact heap and indexes, and any modifications to the main table have to write to the log_ table as well via our trigger.

Could we see the output with --echo --elevel=DEBUG turned on? If you have the 'ts' utility as well to timestamp the logs, that would be helpful too, e.g. pg_repack ... --echo --elevel=DEBUG dbname 2>&1 | ts

schmiddy commented 11 years ago

If you want to kill the current job, I recommend killing the client first -- either ctrl-c, or if that doesn't work a pg_cancel_backend() or pg_terminate_backend() of the pg_repack connection should cause it to bail out. Then you can safely DROP EXTENSION pg_repack CASCADE;

armanddp commented 11 years ago

I'll restart it shortly with logging enabled. Going to kill the client now.

armanddp commented 11 years ago

Must say, I love the library so far. It feels really solid (except now for the hic-up with catch-up). This cancel and drop was seamless, I usually have cold shivers when I need to do something like that.

I just want to see what our traffic looks like then I'll run it with the log level up.

armanddp commented 11 years ago

I've tried the latest version again and see similar behavior. It's doing the catch-up as intended but it's just too slow to catch up with the live data. Unfortunately I don't have ts installed but the repack_apply loops around 10 seconds.

x #select now(), min(id),max(id),max(id)-min(id) as left from repack.log_18369; now | min | max | left -------------------------------+-------+----------+---------- 2013-05-29 14:06:25.218692+00 | 15031 | 10457455 | 10442424 (1 row)

x # select now(), min(id),max(id),max(id)-min(id) as left from repack.log_18369; now | min | max | left -------------------------------+-------+----------+---------- 2013-05-29 14:10:33.162414+00 | 25031 | 10651793 | 10626762

Below is the log:

LOG: (query) SELECT repack.repack_apply($1, $2, $3, $4, $5, $6) LOG: (param:0) = SELECT * FROM repack.log_18369 ORDER BY id LIMIT $1 LOG: (param:1) = INSERT INTO repack.table18369 VALUES ($1.) LOG: (param:2) = DELETE FROM repack.table_18369 WHERE (id) = ($1.id) LOG: (param:3) = UPDATE repack.table_18369 SET (id, type, recipient, sender, reference, data, message_id, created_at, updated_at, state, client_ref, meta_data, broadcast_id, app_version) = ($2.id, $2.type, $2.recipient, $2.sender, $2.reference, $2.data, $2.message_id, $2.created_at, $2.updated_at, $2.state, $2.client_ref, $2.meta_data, $2.broadcast_id, $2.app_version) WHERE (id) = ($1.id) LOG: (param:4) = DELETE FROM repack.log_18369 WHERE id <= $1 LOG: (param:5) = 1000 LOG: (query) SELECT repack.repack_apply($1, $2, $3, $4, $5, $6) LOG: (param:0) = SELECT * FROM repack.log_18369 ORDER BY id LIMIT $1 LOG: (param:1) = INSERT INTO repack.table18369 VALUES ($1.) LOG: (param:2) = DELETE FROM repack.table_18369 WHERE (id) = ($1.id) LOG: (param:3) = UPDATE repack.table_18369 SET (id, type, recipient, sender, reference, data, message_id, created_at, updated_at, state, client_ref, meta_data, broadcast_id, app_version) = ($2.id, $2.type, $2.recipient, $2.sender, $2.reference, $2.data, $2.message_id, $2.created_at, $2.updated_at, $2.state, $2.client_ref, $2.meta_data, $2.broadcast_id, $2.app_version) WHERE (id) = ($1.id) LOG: (param:4) = DELETE FROM repack.log_18369 WHERE id <= $1 LOG: (param:5) = 1000 LOG: (query) SELECT repack.repack_apply($1, $2, $3, $4, $5, $6) LOG: (param:0) = SELECT * FROM repack.log_18369 ORDER BY id LIMIT $1 LOG: (param:1) = INSERT INTO repack.table_18369 VALUES ($1.*) LOG: (param:2) = DELETE FROM repack.table_18369 WHERE (id) = ($1.id) LOG: (param:3) = UPDATE repack.table_18369 SET (id, type, recipient, sender, reference, data, message_id, created_at, updated_at, state, client_ref, meta_data, broadcast_id, app_version) = ($2.id, $2.type, $2.recipient, $2.sender, $2.reference, $2.data, $2.message_id, $2.created_at, $2.updated_at, $2.state, $2.client_ref, $2.meta_data, $2.broadcast_id, $2.app_version) WHERE (id) = ($1.id) LOG: (param:4) = DELETE FROM repack.log_18369 WHERE id <= $1 LOG: (param:5) = 1000

schmiddy commented 11 years ago

Hi armanddp,

Thanks for checking with the latest version and providing logs. Sorry I haven't had a chance to investigate this problem more yet. If you could provide us a few more details, that might help us narrow down what's going on.

First, some information which could hopefully be gathered unintrusively:

The above details would be a great starting point.

If you are able to spend a bit more time trying pg_repack on the problem table again, a timestamped pg_repack output log would be pretty helpful. You can grab the 'ts' utility either from installing the 'moreutils' package, or by grabbing just 'ts' from https://raw.github.com/madx/moreutils/master/ts. Then, pipe the verbose output from pg_repack through that script, so we can see exactly how long the steps are taking. (The beginning of the log will be interesting too, to see how long the initial steps are taking.) If some of the repack_apply() steps are taking more than a few seconds, you could check pg_stat_activity to see if anything unusual is going on in the database. (If you are trying pg_repack from git, please grab the latest git master, as I recently fixed a bug which could cause pg_repack to bail out during the swap step.)

Sorry for asking for so many details, but we would like to understand what's going on, and hopefully some of the above details would give us a starting point. I think that repack_apply() could be optimized to better handle large write traffic to the main table, but we should know more about the problem before fooling with optimizations.

Thanks Josh

armanddp commented 11 years ago

Thanks Josh, I'm busy running it to get a timestamped output log. I'll share the results and the rest of the requested information when it's done (for better or worse).

Thanks!

Armand

stefank2 commented 10 years ago

Hmm.
Mine cycled 15000 times through repack_apply on a table with 67000 live tuples which took about 4 hours. It normally has no trouble with that table and finishes it in a couple minutes. I dont think the table is really being updated very quickly, I think perhaps a few hundred rows were added, deleted, or updated in that time. There is another table that more commonly takes hours to finish; it is not being updated very quickly either. Strange. ... Oct 04 12:57:57 LOG: (query) SELECT repack.repack_apply($1, $2, $3, $4, $5, $6) Oct 04 12:57:57 LOG: (param:0) = SELECT * FROM repack.log_16798 ORDER BY id LIMIT $1 Oct 04 12:57:57 LOG: (param:1) = INSERT INTO repack.table16798 VALUES ($1.) Oct 04 12:57:57 LOG: (param:2) = DELETE FROM repack.table_16798 WHERE (id) = ($1.id) Oct 04 12:57:57 LOG: (param:3) = UPDATE repack.table_16798 SET (id, targcartid, targcartoffset, refcartid, refcartoffset, targbusid, refbusid, checksum, created, examined, used, fullness, targcartoffend, covdensity) = ($2.id, $2.targcartid, $2.targcartoffset, $2.refcartid, $2.refcartoffset, $2.targbusid, $2.refbusid, $2.checksum, $2.created, $2.examined, $2.used, $2.fullness, $2.targcartoffend, $2.covdensity) WHERE (id) = ($1.id) Oct 04 12:57:57 LOG: (param:4) = DELETE FROM repack.log_16798 WHERE id = $1 Oct 04 12:57:57 LOG: (param:5) = 1000 Oct 04 12:57:57 LOG: (query) SELECT pid FROM pg_locks WHERE locktype = 'virtualxid' AND pid <> pg_backend_pid() AND virtualtransaction = ANY($1) Oct 04 12:57:57 LOG: (param:0) = {196/567622,147/3147735,52/19165090,116/1035212,38/17448680,14/17306840,2/15447834} Oct 04 12:57:58 LOG: (query) SELECT repack.repack_apply($1, $2, $3, $4, $5, $6) Oct 04 12:57:58 LOG: (param:0) = SELECT * FROM repack.log_16798 ORDER BY id LIMIT $1 Oct 04 12:57:58 LOG: (param:1) = INSERT INTO repack.table16798 VALUES ($1.) Oct 04 12:57:58 LOG: (param:2) = DELETE FROM repack.table_16798 WHERE (id) = ($1.id) Oct 04 12:57:58 LOG: (param:3) = UPDATE repack.table_16798 SET (id, targcartid, targcartoffset, refcartid, refcartoffset, targbusid, refbusid, checksum, created, examined, used, fullness, targcartoffend, covdensity) = ($2.id, $2.targcartid, $2.targcartoffset, $2.refcartid, $2.refcartoffset, $2.targbusid, $2.refbusid, $2.checksum, $2.created, $2.examined, $2.used, $2.fullness, $2.targcartoffend, $2.covdensity) WHERE (id) = ($1.id) Oct 04 12:57:58 LOG: (param:4) = DELETE FROM repack.log_16798 WHERE id = $1 Oct 04 12:57:58 LOG: (param:5) = 1000 Oct 04 12:57:58 LOG: (query) SELECT pid FROM pg_locks WHERE locktype = 'virtualxid' AND pid <> pg_backend_pid() AND virtualtransaction = ANY($1) Oct 04 12:57:58 LOG: (param:0) = {196/567622,147/3147735,52/19165090,116/1035212,38/17448680,14/17306840,2/15447834} Oct 04 12:57:59 LOG: (query) SELECT repack.repack_apply($1, $2, $3, $4, $5, $6) ...

schmiddy commented 10 years ago

If you are seeing the SQL_XID_ALIVE query repeated every second pg_repack must be waiting for an old transaction to finish before it trusts that it is safe to perform its final table swapping. To find the PIDs of those old transactions hanging out, you can run that same query:

SELECT pid FROM pg_locks WHERE locktype = 'virtualxid' AND pid <> pg_backend_pid() AND virtualtransaction = ANY($1)

substituting in the arguments listed by pg_repack as "param:0" in the subsequent LOG message: in your case, that would have been '{196/567622,147/3147735,52/19165090,116/1035212,38/17448680,14/17306840,2/15447834}'.

There is a NOTICE printed by pg_repack when it is waiting:

Waiting for %d transactions to finish. First PID: %s

but that message only gets printed upon a change in the number of backends it is waiting on (i.e. the first time we notice that we're waiting for some number of backends, and any time that number of backends changes). IIRC the rationale for not printing that NOTICE in every loop was to avoid logspam, though maybe we should revisit that decision since it's far from apparent that pg_repack is stuck waiting with the current state of affairs.

vincentdephily commented 10 years ago

I'm having the same issue right now. I was seeing the same 'SELECT pid FROM pg_locks' earlyer, but not anymore so I can't run that query manually to see what's blocking.

Running pg_repack 1.2.1 on PG 9.1.13.

The table in question is used as a queue: a trigger inserts rows, then the queue-processing connection sets 'processing=true' on some rows, and then deletes them. There's around 10-15K inserts per minute.

There was an issue with the queue processing which let the table baloon to 4G (7G with indexes), but it has caught up now and the number of live rows stays under 10K. The server's IO is very loaded, but keeping up. Even though queue processing has caught up, the table and its indexes is still growing anormally, currently up to 12G.

Column | Type | Modifiers
-------------+-----------------------------+------------------------ id | bigint | not null i1 | integer | i2 | integer | i3 | integer | i4 | integer | d1 | timestamp without time zone | not null d2 | timestamp without time zone | not null t1 | bytea | b1 | boolean | not null default false Indexes: "foo_pkey" PRIMARY KEY, btree (id) "index_96155" UNIQUE, btree (id) "index_1480860" btree (b1) "foo_b1_idx" btree (b1) Triggers: foo_insert_trigger AFTER INSERT ON foo FOR EACH STATEMENT EXECUTE PROCEDURE foo_insert_notify() Triggers firing always: z_repack_trigger BEFORE INSERT OR DELETE OR UPDATE ON tracking_push FOR EACH ROW EXECUTE PROCEDURE repack.repack_trigger('INSERT INTO repack.log_96075(pk, row) VALUES( CASE WHEN $1 IS NULL THEN NULL ELSE (ROW($1.id)::repack.pk_96075) END, $2)')

schmiddy commented 10 years ago

If pg_repack is not waiting on old transactions to die off, another possibility is that a steady stream of new rows recorded by the trigger are causing repack_apply to always have just a few rows left to process, so this logic:

    /*                                                                      
     * 4. Apply log to temp table until no tuples are left in the log       
     * and all of the old transactions are finished.                        
     */

never actually finishes -- I've seen this on a large busy table before. IMO we should fix pg_repack so that if apply_log() indicates that it has processed just a few remaining rows in the log table, we take the exclusive lock and finish up, instead of waiting for apply_log() to get all the way down to 0, which may never actually happen.

In the meantime, to find out if that's what biting you, check what SELECT COUNT(*) FROM repack.log_xxx says while pg_repack is spinning through that loop performing apply_log(). If you see that number steadily decreasing from a huge count down to zero, everything is fine. If you see that count bouncing around with numbers well under 1000, then you are experiencing this bug. If so, I suggest temporarily halting writes to your main table for a few seconds, enough for apply_log() to chew through the log entries and return 0.

vincentdephily commented 10 years ago

Yes, had looked at this, row count on repack table bounces around a lot, which is expected behaviour for the source table.

I tried stoping the queue processing to lower activity on the table (so that it only gets inserts, no updates nor deletes), but that wasn't enough for pg_repack to notice that it was a good time to take the lock.

Stoping inserts on that table is a tough one, it'd mean a service interruption which I'd rather avoid.

schmiddy commented 10 years ago

I think you should be able to grab a LOCK TABLE original_table IN SHARE MODE; while pg_repack is finishing up, to stop updates to the original table for a second or two.

vincentdephily commented 10 years ago

Glancing at the code, it seems like just raising the first "if (num > 0)" threshold would be enough ? Apply_log() is being run again after we get the exclusive lock, so it looks safe ?

vincentdephily commented 10 years ago

Locking the table did the trick, although I had to lock the table that fed into the queue table rather than the queue table itself (which I the one I wanted to repack), because otherwise pg_repack saw a running transaction and wouldn't continue, and releasing the lock got me back to the original race condition situation. So "nice trick, not universally usable" :)

BTW, even though there were < 10k rows in repack.table_xxxx, there were many more (up to 300k) in repack.log_xxxx. It took a while to process them. Just mentioning this in case it explains the scenario better.

schmiddy commented 10 years ago

Glancing at the code, it seems like just raising the first "if (num > 0)" threshold would be enough ?

Yeah, I think bumping that up to, say, 50 would be an improvement over what we have now. Or maybe we'd want something just a bit more sophisticated, like check for num <= 50 and also that num was small in the last loop iteration as well.

BTW, even though there were < 10k rows in repack.table_xxxx, there were many more (up to 300k) in repack.log_xxxx. It took a while to process them. Just mentioning this in case it explains the scenario better.

Oh, so maybe you weren't falling into the case of being stuck in a loop waiting of apply_log() always having a small handful of rows to chew through -- perhaps your write volume to the original table was simply high enough that pg_repack was having difficulty keeping up.

vincentdephily commented 10 years ago

Sorry, musn't have been level-headed yesterday, I tougtht you suggested to check the size of repack.table_xxxx instead of repack.log_xxxx. Makes perfect sense today :) My problem might well have been an IO bandwidth one, I need to free up ressources. The log was big, but it wasn't growing unboundedly as far as I could see.

Not waiting until the log is exactly empty still sounds like a good idea. I suggest making the threshold configurable from the command line.

ktm-rice commented 9 years ago

Hi, I am having a similar problem with pg_repack. The table is very busy and pg_repack never catches up with the main table and so the log table grows without bound. Is there anyway to make the apply_log function faster? Regards, Ken

gdeglin commented 9 years ago

Ran into this myself recently and had to briefly stop my application to get pg_repack to finish packing a couple of tables. Seems like it was close enough that the downtime was under 10 seconds each time, but it'd be great to have a fix for this.

drakkan commented 9 years ago

Hi, can you please verify if your apps keep connections "idle in transaction"? If so I think pg_repack cannot complete.

ktm-rice commented 9 years ago

On Tue, Sep 08, 2015 at 01:57:31PM -0700, drakkan wrote:

Hi, can you please verify if your apps keep connections "idle in transaction"? If so I think pg_repack cannot complete.

Hi,

No they are not "idle in transaction", just "idle" waiting for a new transaction. The problem was that the single apply_log processe just could not empty quickly enough to gain on all of the new updates. I ran the pg_repack over the weekend during a less active period and it completed successfully.

Regards, Ken

gdeglin commented 9 years ago

@drakkan Not idle in transaction here either. We're experiencing the same symptoms as @ktm-rice.

pyhalov commented 4 years ago

See the same here, due to high I/O load, pg_repack can't process queue. It has gone from initial ~ 500k records down to ~ 1k records, but pg_repack didn't get this chance to switch tables, and now queue is 1000k long. Can we at least have configurable APPLY_COUNT and MIN_TUPLES_BEFORE_SWITCH , so that they could be adjusted for big/busy tables?

chobostar commented 3 years ago

@pyhalov here is PR: https://github.com/reorg/pg_repack/pull/261

carl-reverb commented 6 months ago

We're facing an issue with our large table 'entries', and an apparently interminable repack on RDS Aurora PostgreSQL 13.13 with pg_repack 1.4.7.

This table is 1.5TB in size with 1.2 billion rows and is receiving 400k inserts and 80k updates and 80k deletes per minute. The inserts are large bulk-inserts with hundreds of values per statement.

Now pg_repack's log table has over 3 billion rows in it and according to pganalyze it's inserting about 400k rows per minute and deleting about 60k rows per minute. There's no sign of this trend reversing, so I believe I have no choice but to declare bankruptcy and drop the repacking effort.

Unless there's something I'm overlooking that will let this run faster? I already caled our instance class up to db.r5.12xlarge which can sustain over 1GB of I/O bandwidth and neither storage bandwidth nor cpu bandwidth appear to be constraining the rate of work.