trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.13k stars 2.92k forks source link

iceberg - dictionaryPagesSize and bloomFilter cannot both be set #22701

Open raphaelauv opened 1 month ago

raphaelauv commented 1 month ago

hi , I can't run

ALTER TABLE iceberg.default.AA EXECUTE optimize

on a table that have bloom filters

CREATE TABLE iceberg.default.AA 
( 
 created_at timestamp(6) NOT NULL,
 toto varchar NOT NULL
 ) WITH ( 
 format = 'PARQUET',
 format_version = 2,
 location = 's3://XXXX/default/AA',
 parquet_bloom_filter_columns = ARRAY['toto'],
 partitioning = ARRAY['hour(created_at)'] 
 )

it log

dictionaryPagesSize and bloomFilter cannot both be set
raunaqmorarka commented 1 month ago

@jkylling PTAL

jkylling commented 1 month ago

It even fails on

        String tableName = "parquet_with_bloom_filters_" + randomNameSuffix();
        CatalogSchemaTableName catalogSchemaTableName = new CatalogSchemaTableName("iceberg", new SchemaTableName("tpch", tableName));
        assertUpdate(format("CREATE TABLE %s WITH (format = 'PARQUET', parquet_bloom_filter_columns = ARRAY['t']) AS SELECT * FROM (VALUES 1, 1) s(t)", catalogSchemaTableName), 2);

If a page of a column chunk has dictionary encoding we currently do not write its values into the Bloom filter, but we will still try to write a Bloom filter for that column. Luckily this check caught it.

It probably makes sense to remove the check and do one of: