prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
15.79k stars 5.29k forks source link

Fencepost error in row IDs #22822

Open elharo opened 1 month ago

elharo commented 1 month ago

Select "$row_id" AS verify_row_id from di.global_compendium_tables where ds='2024-05-20' LIMIT 9; fails

whereas

select "$row_id" AS verify_row_id from di.global_compendium_tables where ds='2024-05-20' LIMIT 8; succeeds

Caused by: java.lang.IllegalArgumentException: Invalid position 1 in block with 1 positions
        at com.facebook.presto.common.block.BlockUtil.checkValidPosition(BlockUtil.java:76)
        at com.facebook.presto.common.block.AbstractVariableWidthBlock.checkReadablePosition(AbstractVariableWidthBlock.java:191)
        at com.facebook.presto.common.block.AbstractVariableWidthBlock.isNull(AbstractVariableWidthBlock.java:185)
        at com.facebook.presto.common.type.VarbinaryType.getObjectValue(VarbinaryType.java:53)
        at com.facebook.presto.server.protocol.RowIterable$RowIterator.computeNext(RowIterable.java:77)
        at com.facebook.presto.server.protocol.RowIterable$RowIterator.computeNext(RowIterable.java:50)
        at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:141)
        at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:136)
        at com.google.common.collect.Iterators$ConcatenatedIterator.hasNext(Iterators.java:1323)
        at com.google.common.collect.Iterators$1.hasNext(Iterators.java:136)
        at com.fasterxml.jackson.databind.ser.std.IterableSerializer.serializeContents(IterableSerializer.java:111)
        at com.fasterxml.jackson.databind.ser.std.IterableSerializer.serialize(IterableSerializer.java:74)
        at com.fasterxml.jackson.databind.ser.std.IterableSerializer.serialize(IterableSerializer.java:12)
        at com.fasterxml.jackson.databind.ser.std.StdDelegatingSerializer.serialize(StdDelegatingSerializer.java:168)
        at com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:727)
        at com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:721)

This is a big clue:

select table_id, "$row_id" from di.global_compendium_tables where ds='2024-05-20' LIMIT 9;

succeeds but

select "$row_id", table_id from di.global_compendium_tables where ds='2024-05-20' LIMIT 9; fails

That is, the problem only occurs when "$row_id" is the last columns selected by the query.

Note this code in RowIterable

        @Override
        protected List<Object> computeNext()
        {
            position++;
            if (position >= page.getPositionCount()) {
                return endOfData();
            }

Problem might not be that we're counting too far but that page.getPositionCount() is returning a number that's 1 too small

elharo commented 1 month ago

What kind of block do we have in this case and is the position count there accurate for a terminal row ID?