timescale / timescaledb

An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
https://www.timescale.com/
Other
16.8k stars 852 forks source link

[Bug]: DML decompression crash with by reference scankey on 32 bit systems #7039

Closed erimatnor closed 1 week ago

erimatnor commented 2 weeks ago

What type of bug is this?

Crash

What subsystems and features are affected?

Compression

What happened?

A crash occurs when deleting or updating compressed data using a predicate on a column that a by reference type. This is the case for, e.g., 8 byte integers when running on a 32-bit system.

The issue seems to be due to DML decompression using a scankey that doesn't properly handle the by reference value, leading to memory corruption.

TimescaleDB version affected

2.16.0-dev

PostgreSQL version used

16.3

What operating system did you use?

Debian

What installation method did you use?

Source

What platform did you run on?

Other

Relevant log output and stack trace

create table hyper (time timestamptz, device int8, location int8, temp float8);
CREATE TABLE
select create_hypertable('hyper', 'time', create_default_indexes => false);
NOTICE:  adding not-null constraint to column "time"
DETAIL:  Dimensions cannot have NULL values.
 create_hypertable  
--------------------
 (1,public,hyper,t)
(1 row)

insert into hyper values ('2024-01-01', 1, 1, 1.0);
INSERT 0 1
alter table hyper set (timescaledb.compress, timescaledb.compress_segmentby='device');
NOTICE:  default order by for hypertable "hyper" is set to ""time" DESC"
ALTER TABLE
select compress_chunk(ch) from show_chunks('hyper') ch;
             compress_chunk             
----------------------------------------
 _timescaledb_internal._hyper_1_1_chunk
(1 row)

delete from hyper where device = 1;
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
connection to server was lost

Core was generated by `postgres: enordstr postgres 127.0.0.1(41104) DELETE                           '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  DatumGetInt64 (X=1) at ../../../../src/include/postgres.h:390
390 ../../../../src/include/postgres.h: No such file or directory.
(gdb) bt
#0  DatumGetInt64 (X=1) at ../../../../src/include/postgres.h:390
#1  0x005d3fad in btint8cmp (fcinfo=0xbfbe9484) at nbtcompare.c:135
#2  0x00c10048 in FunctionCall2Coll (flinfo=0xbfbe9c24, collation=0, arg1=2993979364, arg2=1) at fmgr.c:1132
#3  0x005e56cc in _bt_compare (rel=0xb076db00, key=0xbfbe9c04, page=0xb2746000 "", offnum=1) at nbtsearch.c:744
#4  0x005e4f1e in _bt_binsrch (rel=0xb076db00, key=0xbfbe9c04, buf=3461) at nbtsearch.c:403
#5  0x005e62d6 in _bt_first (scan=0x1d89b38, dir=ForwardScanDirection) at nbtsearch.c:1418
#6  0x005e2090 in btgettuple (scan=0x1d89b38, dir=ForwardScanDirection) at nbtree.c:245
#7  0x005d2e32 in index_getnext_tid (scan=0x1d89b38, direction=ForwardScanDirection) at indexam.c:583
#8  0x005d3043 in index_getnext_slot (scan=0x1d89b38, direction=ForwardScanDirection, slot=0x1d89aac) at indexam.c:675
#9  0xb0838129 in decompress_batches_using_index (decompressor=0xbfbea4c4, index_rel=0xb076db00, index_scankeys=0x1dbab50, 
    num_index_scankeys=1, scankeys=0x0, num_scankeys=0, null_columns=0x0, is_nulls=0x0, chunk_status_changed=0xbfbea4c3)
    at /home/enordstr/timescaledb-private/tsl/src/compression/compression.c:3037
#10 0xb0838381 in decompress_batches_for_update_delete (ht_state=0x1d885d4, chunk=0x1d89c50, predicates=0x1d8999c, estate=0x1d88474)
    at /home/enordstr/timescaledb-private/tsl/src/compression/compression.c:3212
#11 0xb0838859 in decompress_chunk_walker (ps=0x1d88d0c, ctx=0xbfbea6c0)
    at /home/enordstr/timescaledb-private/tsl/src/compression/compression.c:3360
#12 0x00895cd9 in planstate_tree_walker_impl (planstate=0x1d88874, walker=0xb083864e <decompress_chunk_walker>, context=0xbfbea6c0)
    at nodeFuncs.c:4445
#13 0xb08388d8 in decompress_chunk_walker (ps=0x1d88874, ctx=0xbfbea6c0)
    at /home/enordstr/timescaledb-private/tsl/src/compression/compression.c:3390
#14 0xb0838642 in decompress_target_segments (ht_state=0x1d885d4)
    at /home/enordstr/timescaledb-private/tsl/src/compression/compression.c:3292
#15 0xb748ec73 in ExecModifyTable (cs_node=0x1d885d4, pstate=0x1d88874)
    at /home/enordstr/timescaledb-private/src/nodes/hypertable_modify.c:711
#16 0xb748de95 in hypertable_modify_exec (node=0x1d885d4) at /home/enordstr/timescaledb-private/src/nodes/hypertable_modify.c:171
#17 0x0080d557 in ExecCustomScan (pstate=0x1d885d4) at nodeCustom.c:124

How can we reproduce the bug?

-- Run the following script on a 32-bit system (e.g., i386)
create table hyper (time timestamptz, device int8, location int8, temp float8);
select create_hypertable('hyper', 'time', create_default_indexes => false);
insert into hyper values ('2024-01-01', 1, 1, 1.0);
alter table hyper set (timescaledb.compress, timescaledb.compress_segmentby='device');
select compress_chunk(ch) from show_chunks('hyper') ch;
delete from hyper where device = 1;
nikkhils commented 1 week ago

Don't we have a test for deletion DML? That would have been triggered on the 386 tests..

nikkhils commented 1 week ago

The issue is with the comparison

device = 1

our code converts the 1 into an int4 constant whereas the device is int8. So when we generate the batch filter the scankey does a comparison between int8 and int4 causing the crash on i386 machines.

If we change the query to:

delete from hyper where device = 1::int8;

then the query does not crash as expected.

nikkhils commented 1 week ago

Turns out that when we initialize the scan key for the batch decompression, we need to check if the attribute type and the constant argument types are the same. If not, the scan key subtype needs to be set appropriately.