segmentio / parquet-go

Go library to read/write Parquet files
https://pkg.go.dev/github.com/segmentio/parquet-go
Apache License 2.0
341 stars 102 forks source link

Fix Generic Read methods to read all rows up to length of argument #489

Closed bartleyg closed 1 year ago

bartleyg commented 1 year ago

This fixes the generic Read methods where only one page worth of rows was read instead of all rows up to length of the argument slice. This affected at least parquet.ReadFile, parquet.Read, and GenericReader.Read.

You can see a failed test here https://github.com/segmentio/parquet-go/actions/runs/4515595007/jobs/7953065166 without the fix with a new test case using larger sizes here https://github.com/segmentio/parquet-go/pull/489/files#diff-306797de164c962fe5b992da67ab7ff245b8a8e7c76ca21e350ee08c7071d39eR494.

I added a new generic Read path test with multiple RowGroups & multiple Pages to increase the test coverage around this.

I also noted that the TestWriter writerTests have been skipped since using Github Actions because parquet-tools are not installed in Github CI.

Fixes #469, #471, & possibly others.