trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.49k stars 3.02k forks source link

Delta Lake connector returns incorrect results when containing partition predicates with checkpoints on name column mapping tables #24104

Closed ebyhr closed 1 week ago

ebyhr commented 1 week ago

Steps to reproduce:

CREATE TABLE test_name(id int, log_date date) with (column_mapping_mode = 'NAME', partitioned_by = ARRAY['log_date']);
INSERT INTO test_name VALUES (1, DATE '2024-03-01');
INSERT INTO test_name VALUES (2, DATE '2024-03-02');
INSERT INTO test_name VALUES (3, DATE '2024-03-03');
SELECT * FROM test_name WHERE log_date = DATE '2024-03-01';
ALTER TABLE test_name EXECUTE optimize;
SELECT * FROM test_name WHERE log_date = DATE '2024-03-01';

or

CREATE TABLE test_name_ci(id int, log_date date) with (column_mapping_mode = 'NAME', partitioned_by = ARRAY['log_date'], checkpoint_interval = 3);
INSERT INTO test_name_ci VALUES (1, DATE '2024-03-01');
INSERT INTO test_name_ci VALUES (2, DATE '2024-03-02');
INSERT INTO test_name_ci VALUES (3, DATE '2024-03-03');
SELECT * FROM test_name WHERE log_date = DATE '2024-03-01';

ID and NONE column mapping mode return the expected result:

CREATE TABLE test_none(id int, log_date date) with (column_mapping_mode = 'NONE', partitioned_by = ARRAY['log_date']);
INSERT INTO test_none VALUES (1, DATE '2024-03-01');
INSERT INTO test_none VALUES (2, DATE '2024-03-02');
INSERT INTO test_none VALUES (3, DATE '2024-03-03');
SELECT * FROM test_none WHERE log_date = DATE '2024-03-01';
ALTER TABLE test_none EXECUTE optimize;
SELECT * FROM test_none WHERE log_date = DATE '2024-03-01';