[Bug]: it is decompressing more rows than seems to be needed while deleting

yarkoyarok commented 3 months ago

What type of bug is this?

Unexpected error

What subsystems and features are affected?

Compression, Query executor, Query planner

What happened?

I have table which looks like:

CREATE TABLE "observations" (
     "report_id" INTEGER NOT NULL,
     "device_id" INTEGER NOT NULL,
     "observation" DOUBLE PRECISION NOT NULL,
     "generated_at" TIMESTAMPTZ NOT NULL);

ALTER TABLE "observations" ADD CONSTRAINT observations_pkey PRIMARY KEY (device_id,generated_at);

SELECT create_hypertable ('observations', 'generated_at', chunk_time_interval => INTERVAL '1 day', if_not_exists => TRUE, create_default_indexes => FALSE);

ALTER TABLE "observations" SET (timescaledb.compress,timescaledb.compress_segmentby = 'device_id');

and on query

DELETE
FROM observations
WHERE report_id = 49051
  AND generated_at = '2024-01-01 23:31:12.051+00'
  AND device_id = ANY(ARRAY[1,2,3]::int[])

I am getting error:

ERROR:  tuple decompression limit exceeded by operation
DETAIL:  current limit: 100000, tuples decompressed: 320000

And it looks strange, while in query I've specified filter on segmentby field (device_id) together with orderby field (generated_at) as well. So I can expect that it will need to decompress only three actual segments.

TimescaleDB version affected

2.15.2

PostgreSQL version used

15.7

What operating system did you use?

Mac OS 14.5

What installation method did you use?

Docker

What platform did you run on?

Timescale Cloud

Relevant log output and stack trace

No response

How can we reproduce the bug?

CREATE TABLE "observations" (
     "report_id" INTEGER NOT NULL,
     "device_id" INTEGER NOT NULL,
     "observation" DOUBLE PRECISION NOT NULL,
     "generated_at" TIMESTAMPTZ NOT NULL);

ALTER TABLE "observations" ADD CONSTRAINT observations_pkey PRIMARY KEY (device_id,generated_at);

SELECT create_hypertable ('observations', 'generated_at', chunk_time_interval => INTERVAL '1 day', if_not_exists => TRUE, create_default_indexes => FALSE);

-- make device_id a dimension
ALTER TABLE "observations"
SET (timescaledb.compress,timescaledb.compress_segmentby = 'device_id',timescaledb.compress_orderby = 'generated_at');

-- insert some data
INSERT INTO "observations" ("report_id",
                            "device_id",
                            "observation",
                            "generated_at")
SELECT report_observation * 1000 + i,
       device,
       (random() * 100)::int,
       '2024-01-01'::TIMESTAMPTZ + ((report_observation * (86400000 / 50) + i) * INTERVAL '1 millisecond')
FROM generate_series(1, 49) report_observation
JOIN generate_series(1, 400) device ON true
JOIN generate_series(1, 200) i ON true;

-- compress the data
select compress_chunk(i)
from show_chunks('observations') i;

-- try to delete few rows
delete
from observations
where report_id = 49051
  and generated_at = '2024-01-01 23:31:12.051+00'
  and device_id = any(array[1,2,3]::int[]);

svenklemm commented 3 months ago

Hmm I cannot reproduce with the script you provided. I'm getting no decompression for that delete.

sven@i7110[378424]=# EXPLAIN analyze delete
from observations
where report_id = 49051
  and generated_at = '2024-01-01 23:31:12.051+00'
  and device_id = any(array[1,2,3]::int[]);
                                                                              QUERY PLAN                                                                              
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Custom Scan (HypertableModify)  (cost=0.00..39.44 rows=1 width=10) (actual time=5.741..5.742 rows=0 loops=1)
   ->  Delete on observations  (cost=0.00..39.44 rows=1 width=10) (actual time=5.741..5.742 rows=0 loops=1)
         Delete on _hyper_1_2_chunk observations_1
         ->  Seq Scan on _hyper_1_2_chunk observations_1  (cost=0.00..39.44 rows=1 width=10) (actual time=0.008..0.009 rows=0 loops=1)
               Filter: ((report_id = 49051) AND (generated_at = '2024-01-02 00:31:12.051+01'::timestamp with time zone) AND (device_id = ANY ('{1,2,3}'::integer[])))
 Planning Time: 1.302 ms
 Execution Time: 5.824 ms
(7 rows)

Time: 20.072 ms
sven@i7110[378424]=# SELECT extname, extversion from pg_extension where extname='timescaledb';
   extname   | extversion 
-------------+------------
 timescaledb | 2.15.2
(1 row)

yarkoyarok commented 3 months ago

Let's sync how we run reproduce script.

I am launching TS through docker first, no volumes mounted, fresh db with default config:

docker run -it -e POSTGRES_HOST_AUTH_METHOD=trust -p 5434:5432 timescale/timescaledb:2.15.2-pg15

Then I create DB:

echo 'create database test;' | psql -h 127.0.0.1 -p 5434 -U postgres
CREATE DATABASE

And then I run reproduction script:

psql -h 127.0.0.1 -p 5434 test postgres < o.sql
CREATE TABLE
ALTER TABLE
     create_hypertable
---------------------------
 (1,public,observations,t)
(1 row)

ALTER TABLE
INSERT 0 3920000
             compress_chunk
----------------------------------------
 _timescaledb_internal._hyper_1_1_chunk
(1 row)

ERROR:  tuple decompression limit exceeded by operation
DETAIL:  current limit: 100000, tuples decompressed: 320000
HINT:  Consider increasing timescaledb.max_tuples_decompressed_per_dml_transaction or set to 0 (unlimited).

yarkoyarok commented 2 months ago

Probably it will be helpful to know, that for one kind of query planner works already as desired: for single row deletion, while having all clauses checking equality with literal:

DELETE
FROM observations
WHERE generated_at = '2024-01-01 23:31:12.051+00'
       AND device_id = 1;

It passes successfully and immediately (9ms).

But even if we'll try to add more rows by using OR between same clauses we have again the same error, and query taken already 7 seconds:

test=# DELETE
test-# FROM observations
test-# WHERE generated_at = '2024-01-01 23:31:12.051+00'
test-#        AND device_id = 1;
DELETE 1
Time: 9.226 ms
test=# DELETE
test-# FROM observations
test-# WHERE (generated_at = '2024-01-01 23:31:12.051+00' AND device_id = 2) OR
test-#   (generated_at = '2024-01-01 23:31:12.051+00' AND device_id = 3) OR
test-#   (generated_at = '2024-01-01 23:31:12.051+00' AND device_id = 4);
ERROR:  tuple decompression limit exceeded by operation
DETAIL:  current limit: 100000, tuples decompressed: 3919200
HINT:  Consider increasing timescaledb.max_tuples_decompressed_per_dml_transaction or set to 0 (unlimited).
Time: 7661.059 ms (00:07.661)

query from above without prompt:

DELETE
FROM observations
WHERE (generated_at = '2024-01-01 23:31:12.051+00' AND device_id = 2) OR
  (generated_at = '2024-01-01 23:31:12.051+00' AND device_id = 3) OR
  (generated_at = '2024-01-01 23:31:12.051+00' AND device_id = 4);

So probably functionality used for single row deletion can be reused for multiple rows as well?

akuzm commented 2 months ago

I got some different timestamp values so that no rows were deleted (for Sven as well), but even if I fix them, the delete doesn't decompress:

test=# begin; delete                                                                                                                                           
from observations
where report_id = 49051
  and generated_at = '2024-01-01 23:31:12.051+01'
  and device_id = any(array[1,2,3]::int[]); rollback;
BEGIN
Time: 0.430 ms
DELETE 3
Time: 2.241 ms
ROLLBACK
Time: 0.229 ms

But with OR I do see the "tuple limit exceeded":

test=# begin; delete
from observations
where report_id = 49051
  and generated_at = '2024-01-01 23:31:12.051+01'
  and device_id = 1 or device_id = 2; rollback;
BEGIN
Time: 0.852 ms
ERROR:  53400: tuple decompression limit exceeded by operation
DETAIL:  current limit: 100000, tuples decompressed: 3920000
HINT:  Consider increasing timescaledb.max_tuples_decompressed_per_dml_transaction or set to 0 (unlimited).
LOCATION:  ExecModifyTable, hypertable_modify.c:740
Time: 5125.254 ms (00:05.125)
ROLLBACK
Time: 0.337 ms

@svenklemm probably we should treat this as a feature request to support OR clauses for filtering in compressed DML?

yarkoyarok commented 2 months ago

But with OR I do see the "tuple limit exceeded":

With OR used such way you try to delete already not 2 rows, but 9800:

test=# select count(*) from observations
test-# where report_id = 49051
test-#   and generated_at = '2024-01-01 23:31:12.051+01'
test-#   and device_id = 1 or device_id = 2;
 count 
-------
  9800
(1 row)

but I am trying to delete only few rows and decompression of 320 000 rows happens. In my case query with OR clause looks so

DELETE
FROM observations
WHERE (generated_at = '2024-01-01 23:31:12.051+00' AND device_id = 1) OR
  (generated_at = '2024-01-01 23:31:12.051+00' AND device_id = 2);

it attempts to delete 2 rows and that attempt ends with

ERROR:  tuple decompression limit exceeded by operation
DETAIL:  current limit: 100000, tuples decompressed: 320000
HINT:  Consider increasing timescaledb.max_tuples_decompressed_per_dml_transaction or set to 0 (unlimited).
Time: 903.579 ms
test=!#

while using timescale/timescaledb:2.15.2-pg15 docker image.

I got some different timestamp values so that no rows were deleted (for Sven as well), but even if I fix them, the delete doesn't decompress:

Just curious, how it can happen and how reproduction SQL mentioned above can give different timestamps and outcomes.

akuzm commented 2 months ago

Just curious, how it can happen and how reproduction SQL mentioned above can give different timestamps and outcomes.

The expression '2024-01-01'::TIMESTAMPTZ is dependent on the current timezone, can be written as '2024-01-01 00:00:00+00'::TIMESTAMPTZ; to avoid this. I reproduced locally not in docker, so the local tz was different.

yarkoyarok commented 2 months ago

Just curious, how it can happen and how reproduction SQL mentioned above can give different timestamps and outcomes.

The expression '2024-01-01'::TIMESTAMPTZ is dependent on the current timezone, can be written as '2024-01-01 00:00:00+00'::TIMESTAMPTZ; to avoid this. I reproduced locally not in docker, so the local tz was different.

I've tried to replace '2024-01-01' with '2024-01-01 00:00:00.000+00' and nothing changed: rows of observations table with report_id 49051 are having same generated_at: '2024-01-01 23:31:12.051+00'.

Anyway, if on your side somehow timestamps are different, you can find value so:

select distinct generated_at from observations where report_id=49051;

and then use it inside of query

DELETE
FROM observations
WHERE (generated_at = <that_date> AND device_id = 1) OR
  (generated_at = <that_date> AND device_id = 2);

And if your local config is same as bundled with docker image it's expected you to get error.

akuzm commented 2 months ago

Anyway, if on your side somehow timestamps are different, you can find value so:

Sorry for the confusion, I did already reproduce the excessive decompression you described, was just answering your question about the different timestamp. The reason for decompressing too much is that we don't support OR clauses for filtering what we decompress. This is something we have to improve.

yarkoyarok commented 2 months ago

The reason for decompressing too much is that we don't support OR clauses for filtering what we decompress.

delete
from observations
where report_id = 49051
  and generated_at = '2024-01-01 23:31:12.051+00'
  and device_id = any(array[1,2,3]::int[]);

So device_id = any(array[1,2,3]::int[]) in this query is interpreted as set of OR queries, do I interpret you right?

melicheradam commented 2 months ago

This should be supported naturally or listed as a limitation because we have spent a lot of time debugging this aswell and caused us issues on production DB since it was unexpected behaviour. Docs are stating DML operations are supported for compressed chunks so one would not expect chunks exploding on delete.

akuzm commented 2 months ago

So device_id = any(array[1,2,3]::int[]) in this query is interpreted as set of OR queries, do I interpret you right?

This one is "scalar array operation", the support for it should be out with 2.16, was implemented here: https://github.com/timescale/timescaledb/pull/6880 I checked this query on main branch now, works fine deleting 3 records w/o excessive decompression.

melicheradam commented 1 month ago

Hello @akuzm , im still seeing this behaviour on the 2.16.1 version, but im using a little bit different query.

I have a table with timestamp, device_id (uuid), value columns.

Compression enables, with device_id as segment by.

Now if i run

DELETE FROM table_name 
  WHERE device_id = <some_uuid> AND timestamp=ANY(<batch of 1000 timestamps>)

I am still getting the error about that it is trying to decompress over 100k tuples. Maybe it would make sense, that it is trying to decompress the whole segment for that device_id, but if i want to delete 100k timestamps from this device by batches of 1k, its taking forever. Decompressing whole chunks seems excessive aswell.

I cannot use a larger (over 1664) batch with the ANY() syntax, because then im getting target lists can have at most 1664 entries.

Am I doing something incorrectly? What would be the best approach to do this? Basically I need to delete a lot of records from multiple devices, but these records may not be continuous so I need to delete them by exact timestamps.

melicheradam commented 1 month ago

UPDATE:

If anybody stumbles upon this, I was able to reduce the delete time from 40 minutes to 55 seconds, by sorting the batches, and taking min and max, and adding it to query as

DELETE FROM table_name 
  WHERE timestamp BETWEEN <min_batch> AND <max_batch> 
    AND device_id = <some_uuid> 
    AND timestamp=ANY(<batch of 1000 timestamps>)

timescale / timescaledb