trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
9.87k stars 2.86k forks source link

Failed to get stats from PostgreSQL with partitioned tables #22537

Open lihonosov opened 1 week ago

lihonosov commented 1 week ago

When attempting to retrieve statistics for partitioned tables from PostgreSQL using Trino, an IllegalArgumentException is thrown if the rowCount is negative. This issue arises in the io.trino.plugin.postgresql.PostgreSqlClient.readTableStatistics method, specifically when the rowCount is less than zero, which is a valid scenario for partitioned tables that have not been vacuumed or analyzed.

Caused by: java.lang.IllegalArgumentException: rowCount must be greater than or equal to 0: -201.0
    at io.trino.spi.statistics.TableStatistics.<init>(TableStatistics.java:43)
    at io.trino.spi.statistics.TableStatistics$Builder.build(TableStatistics.java:118)
    at io.trino.plugin.postgresql.PostgreSqlClient.readTableStatistics(PostgreSqlClient.java:1030)
    at io.trino.plugin.postgresql.PostgreSqlClient.getTableStatistics(PostgreSqlClient.java:963)

The current check in the codebase is:

if (rowCount == -1) {
    // Table has never yet been vacuumed or analyzed
    return TableStatistics.empty();
}

However, this does not account for partitioned tables, where the rowCount can be less than -1. A proposed change to accommodate this scenario is:

if (rowCount <= -1) {
    // Table has never yet been vacuumed or analyzed:
    // -1 indicates non-partitioned tables, and values less than -1 indicate partitioned tables
    return TableStatistics.empty();
}

Similar issue: https://github.com/trinodb/trino/issues/17061 Related PR: https://github.com/trinodb/trino/pull/17066

I can create a PR, but perhaps for such a simple case, someone could quickly fix it, so I don't need to submit a Contributor License Agreement (CLA)

hashhar commented 1 week ago

What version of Postgres are you observing this with? And can you provide a simple reproduction - it'll make it possible for me to add a test and verify the fix.

lihonosov commented 1 week ago

What version of Postgres are you observing this with? And can you provide a simple reproduction - it'll make it possible for me to add a test and verify the fix.

Postgres: 15.4 Trino: 450

Steps to reproduce:

docker run --name postgres -e POSTGRES_PASSWORD=mysecretpassword -d postgres
docker exec -it postgres psql -U postgres
CREATE TABLE measurement (
    city_id         int not null,
    logdate         date not null,
    peaktemp        int,
    unitsales       int
) PARTITION BY RANGE (logdate);
CREATE TABLE measurement_y2006m02 PARTITION OF measurement
    FOR VALUES FROM ('2006-02-01') TO ('2006-03-01');
CREATE TABLE measurement_y2006m03 PARTITION OF measurement
    FOR VALUES FROM ('2006-03-01') TO ('2006-04-01');
SELECT SUM(child.reltuples)
FROM pg_inherits
         JOIN pg_class parent ON pg_inherits.inhparent = parent.oid
         JOIN pg_class child ON pg_inherits.inhrelid = child.oid
         JOIN pg_namespace parent_ns ON parent_ns.oid = parent.relnamespace
         JOIN pg_namespace child_ns ON child_ns.oid = child.relnamespace
WHERE parent.oid = 'public.measurement'::regclass;
 sum
-----
  -2
(1 row)
CREATE TABLE measurement_y2024m03 PARTITION OF measurement
    FOR VALUES FROM ('2024-03-01') TO ('2024-04-01');
SELECT SUM(child.reltuples)
FROM pg_inherits
         JOIN pg_class parent ON pg_inherits.inhparent = parent.oid
         JOIN pg_class child ON pg_inherits.inhrelid = child.oid
         JOIN pg_namespace parent_ns ON parent_ns.oid = parent.relnamespace
         JOIN pg_namespace child_ns ON child_ns.oid = child.relnamespace
WHERE parent.oid = 'public.measurement'::regclass;
 sum
-----
  -3
(1 row)