timescale / timescaledb

An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
https://www.timescale.com/
Other
17.78k stars 886 forks source link

Excessive Locking on hypertable #902

Open esatterwhite opened 5 years ago

esatterwhite commented 5 years ago

Relevant system information:

Describe the bug We are seeing a lot of locking on insert to a hyper table. ShareUpdateExclusiveLock and ShareRowExclusiveLock in particular. which is leading to deadlocks on a database with next to no usage.

'Process 614 waits for ShareUpdateExclusiveLock on relation 18780 of database 16384; blocked by process 615.\nProcess 615 waits for ShareRowExclusiveLock on relation 18674 of database 16384; blocked by process 617.\nProcess 617 waits for ShareUpdateExclusiveLock on relation 18780 of database 16384; blocked by process 614.'

18780 is the hyper table. If we don't make that a hyper table, the problem goes away. The hypertable is set for a chunk time of 4 weeks. We only insert into this table, we don't do updates on it.

To Reproduce

Effectively there are two tables, a hyper table and a regular table our application does something like this


INSERT INTO regular_table VALUES (...values)

INSERT INTO hyper_table VALUES
  (...values)
, (...values)
, (...values)

else where

BEGIN
WITH an_update  AS (
    UPDATE regular table SET ...
)
INSERT INTO hyper_table VALUES (...values)
COMMIT

Expected behavior

There shouldn't be this many locks and deadlocks happening on a primarily insert only work load

just removing the create_hypertable command from our migrations and leaving everything else as is clears our locking problem.

esatterwhite commented 5 years ago

image

esatterwhite commented 5 years ago

Update: There is a foreign key constraint on the hyper table referencing the regular table. Droping the constraint from the hypertable also fixes the problem.

I don't think I'm doing anything out of the ordinary.

davidkohn88 commented 5 years ago

Interesting, we'll take a look and see if we can reproduce so we can work on fixing it. How many workers do you have inserting at the same time? If you can share an anonymized script to reproduce the problem that would help immensely (so like take out your normal columns and replace them with dummies in your normal script?) but no worries if you can't share.

esatterwhite commented 5 years ago

4-6 workers Ill see about a script

davidkohn88 commented 5 years ago

Thanks!

esatterwhite commented 5 years ago

Does inserting into a hypertable create a table lock on the parent table?

davidkohn88 commented 5 years ago

Not sure exactly what you're asking here, certainly shouldn't take a table level lock, unless we're creating a new chunk. If there are new chunks created then we may need a table level lock.

davidkohn88 commented 5 years ago

Sorry hit the wrong button, didn't mean to close :)

esatterwhite commented 5 years ago
INSERT INTO events

basically locks the events table. I don't think its a lot of new chunks, our time interval is 4 weeks. so the most our test suite would ever create is 1. It seems like with the way the table lock is handled this is going to be an issue every time a new chunk needs to be added.

esatterwhite commented 5 years ago

Well not even when a chunk needs to be added, this is pretty repeatable it doesn't seem related to chunks being added or not. I'll try to get some kind of a script to reproduce it in isolation tomorrow

davidkohn88 commented 5 years ago

Sorry, when you say a table lock what are you seeing?

esatterwhite commented 5 years ago

ShareUpdateExclusiveLock

davidkohn88 commented 5 years ago

That is probably acquired when we add a new chunk to the table. But you say you're seeing it on a normal insert when a new chunk isn't be created?

davidkohn88 commented 5 years ago

(If it's the first insert into a hypertable, yes it will effectively lock it for other txns, because we're creating the first chunk in the txn, it's not created ahead of time, once that txn commits the other txns should be able to finish no problem. The main thing here is you want to make sure you always access any hypertables and foreign tables in the same order so that you avoid deadlocks).

ntbosscher commented 4 years ago

If you want a second repo case, I'm seeing the same behaviour on Timescale 1.7.2 with a fresh hypertable (no data in it)

Configuration

create table hypertable (
  time timestamp not null,
  endpoint bigint not null references fk(id),
  company bigint not null references company(id),
  metaA int not null,
  metaB bool not null,
  metaC bool not null
);

select create_hypertable('hypertable', 'time', chunk_time_interval => interval '2 days');
2020-07-16 07:19:33.431 EDT [1917] ERROR:  deadlock detected
2020-07-16 07:19:33.431 EDT [1917] DETAIL:  Process 1917 waits for ShareUpdateExclusiveLock on relation 156243 of database ###; blocked by process 1916.
    Process 1916 waits for ShareRowExclusiveLock on relation 41260 of database ###; blocked by process 1917.
    Process 1917: insert into hypertable (
              time, fk, company, 
              metaA, metaB, metaC
          ) values (
              (now() at time zone 'utc'), $1, $2,
            $3, $4, $5
          ) 

    Process 1916: insert into hypertable (
              time, fk, company, 
              metaA, metaB, metaC
          ) values (
              (now() at time zone 'utc'), $1, $2,
            $3, $4, $5
          ) 

2020-07-16 07:19:33.431 EDT [1917] HINT:  See server log for query details.
2020-07-16 07:19:33.431 EDT [1917] STATEMENT:  insert into hypertable (
              time, fk, company, 
              metaA, metaB, metaC
          ) values (
              (now() at time zone 'utc'), $1, $2,
            $3, $4, $5
          ) 
ntbosscher commented 4 years ago

To clarify, I am updating the referenced fk table in the same transaction. Something like this:

begin;

update fk set 
last_check = now()
where id = 1;

insert into hypertable (time, fk, company, ...) values ( now(), 1, <company>, ....)

commit;
begin;

update fk set 
last_check = now()
where id = 2;

insert into hypertable (time, fk, company, ...) values ( now(), 2, <company>, ....)

commit;

Maybe timescale is escalating the lock from a row-level to a table level and that's breaking things?

LRagji commented 4 years ago

Hey, i am also facing same issue, i am running a setup for stress testing 1TB data, we simply have inserts that are cron jobs which fire in parallel NO FK or anything following is the insert statement which fires every 45 mins if we run this in parallel boom everyone is waiting on everyone else..

INSERT INTO "Data"."Raw"("Timestamp", "Value", "Quality","TagId") SELECT (TO_TIMESTAMP('2020-01-01 00:00:00','YYYY-MM-DD HH24:MI:SS')+ ((g||' second')::interval+(ctr||' year')::interval)),g,(g-1),ctr FROM (select generate_series(1, 1) as g,* from nextval('"Counter"') as ctr) as g;

Romeren commented 3 years ago

Hey, im also facing the same issue. Setup: Im using the Cloud Service environment. Postgres Version: 11.9 Spec:

I have a hyper-table with time, device_id_fk and value column, with a foreign key on the device_id_fk column referencing an a regular table.

I have a pretty constant flow of data of 500-600 rows per second.

I get the above mentioned deadlock seemingly randomly once every hour or so.

Do we know the cause of this one and or a workaround that can be used?

devanubis commented 2 years ago

We've ran into this deadlocking with some of our units tests of a bulk import job, which loads in batches of data.

We're using the timescaledb-ha:pg14-latest docker image (affecting both v2.7.1 and v2.8.0)

The tests run in parallel, but with each opening a new transaction (and rolling back) for isolation. However it seems that this isn't isolation enough, as intermittently two parallel tests end up attempting to create a new chunk at the same time (each on their first inserts to the table), hitting a ShareRowExclusiveLock lock and deadlocking.

Our table does have a foreign key, and removing that foreign key does avoid this deadlocking (consistently reproducible) but part of the point of using TimescaleDB is to be able to use SQL relations, so I don't feel it's right to have to avoid them.

I feel like this issue should have the bug label...

leppaott commented 1 year ago

We've ran into this deadlocking with some of our units tests of a bulk import job, which loads in batches of data.

We're using the timescaledb-ha:pg14-latest docker image (affecting both v2.7.1 and v2.8.0)

The tests run in parallel, but with each opening a new transaction (and rolling back) for isolation. However it seems that this isn't isolation enough, as intermittently two parallel tests end up attempting to create a new chunk at the same time (each on their first inserts to the table), hitting a ShareRowExclusiveLock lock and deadlocking.

Our table does have a foreign key, and removing that foreign key does avoid this deadlocking (consistently reproducible) but part of the point of using TimescaleDB is to be able to use SQL relations, so I don't feel it's right to have to avoid them.

I feel like this issue should have the bug label...

Same here but we're delete_job all retention policies on e2e tests so at some point think they're trying to delete already being deleted job and fails...