timescale / timescaledb

An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
https://www.timescale.com/
Other
16.89k stars 853 forks source link

[Bug]: Attribute not suitable recommended for segment_by #6787

Closed mkindahl closed 2 months ago

mkindahl commented 3 months ago

What type of bug is this?

Unexpected error

What subsystems and features are affected?

Compression

What happened?

When creating a table with a primary key on a serial column and a time column, it gives a warning suggesting that the serial column could be a segment-by column.

Serial columns are not suitable to use as segment-by columns since they will generate a single row for each compressed row. It does mention that it could be either a segment-by or an order-by, but it is not clear that using it as segment-by is not a good option.

TimescaleDB version affected

2.15-dev

PostgreSQL version used

16.2

What operating system did you use?

Ubuntu 22.04 x64

What installation method did you use?

Source

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

Expanded display is used automatically.
Null display is "[NULL]".
psql (16.2)
Type "help" for help.

mats=# create table readings(
    metric_id serial,
    created_at timestamptz not null,
    device_id integer,
    user_id integer,
    temperature float,
    primary key (metric_id, created_at)
);
CREATE TABLE
mats=# create index on readings(user_id);
CREATE INDEX
mats=# select * from create_hypertable('readings', 'created_at', chunk_time_interval => '1 hour'::interval);
 hypertable_id | schema_name | table_name | created 
---------------+-------------+------------+---------
            17 | public      | readings   | t
(1 row)

mats=# insert into readings(created_at, device_id, user_id, temperature)
select created_at, (random()*30)::int, 1, random()*80 - 40
from generate_series(now() - interval '6 days', now(), '1 minute') as created_at;
INSERT 0 8641
mats=# alter table readings set (
    timescaledb.compress_segmentby = 'user_id',
    timescaledb.compress
);

WARNING:  column "metric_id" should be used for segmenting or ordering
ALTER TABLE

How can we reproduce the bug?

create table readings(
    metric_id serial,
    created_at timestamptz not null,
    device_id integer,
    user_id integer,
    temperature float,
    primary key (metric_id, created_at)
);

create index on readings(user_id);

select * from create_hypertable('readings', 'created_at', chunk_time_interval => '1 hour'::interval);

insert into readings(created_at, device_id, user_id, temperature)
select created_at, (random()*30)::int, 1, random()*80 - 40
from generate_series(now() - interval '6 days', now(), '1 minute') as created_at;

alter table readings set (
    timescaledb.compress_segmentby = 'user_id',
    timescaledb.compress
);
svenklemm commented 2 months ago

This no longer happens with the new compression defaults.