Open Heltman opened 7 months ago
The error reporting location is as follows, but the location that needs to be processed may not be here. https://github.com/trinodb/trino/blob/4bd0f8de39512ddd6f0b52b601c48b7a9106bac3/lib/trino-parquet/src/main/java/io/trino/parquet/reader/flat/BinaryBuffer.java#L81-L99
@raunaqmorarka can you take a look?
The example query above doesn't run, but here's one that does:
SELECT *
FROM (
SELECT concat_ws('', repeat(concat_ws('', repeat('a', 1000)), 500))
) CROSS JOIN UNNEST(sequence(1, 5000));
@Heltman are you able to tune parquet.max-read-block-row-count
lower to avoid hitting this problem ?
@raunaqmorarka I think that this is unrelated to parquet. The example that Martin shared is not using parquet files.
We also experience this issue with Trino version 451. Turning down parquet.max-read-block-row-count
config helped as a workaround. Our exception stacktrace was a bit different.
Caused by: java.lang.NegativeArraySizeException: -2139450066
at io.airlift.slice.Slices.allocate(Slices.java:91)
at io.trino.parquet.reader.flat.BinaryBuffer.asSlice(BinaryBuffer.java:90)
at io.trino.parquet.reader.flat.BinaryColumnAdapter.createNonNullBlock(BinaryColumnAdapter.java:76)
at io.trino.parquet.reader.flat.BinaryColumnAdapter.createNonNullBlock(BinaryColumnAdapter.java:27)
at io.trino.parquet.reader.flat.FlatColumnReader$DataValuesBuffer.createNonNullBlock(FlatColumnReader.java:392)
at io.trino.parquet.reader.flat.FlatColumnReader.readNonNull(FlatColumnReader.java:193)
at io.trino.parquet.reader.flat.FlatColumnReader.readPrimitive(FlatColumnReader.java:90)
at io.trino.parquet.reader.ParquetReader.readPrimitive(ParquetReader.java:463)
at io.trino.parquet.reader.ParquetReader.readColumnChunk(ParquetReader.java:554)
at io.trino.parquet.reader.ParquetReader.readBlock(ParquetReader.java:537)
at io.trino.parquet.reader.ParquetReader.lambda$nextPage$3(ParquetReader.java:251)
at io.trino.parquet.reader.ParquetBlockFactory$ParquetBlockLoader.load(ParquetBlockFactory.java:72)
... 43 more
We received an error when reading a table with a certain column that has a large length. After inspection, we found that the length exceeded the size of the Slice by 2GB.
After investigation, it is because Trino reads files in batches. The batch size is generally 4096, and our column size is 500 KB, thus exceeding the limit of Slice.
I use trino 421 version, but I guess this is a common problem. We have discussed it on slack, @wendigo @electrum
I built a minimal reproducible case as follows:
error: