Closed danwt closed 1 year ago
Are you implementing your own ReadRows
function? This looks like it read more than it should have. 64 then 128 like r.rowIndex is not being incremented or SeekToRow is not being called.
No I'm not implementing my own function
import (
"github.com/segmentio/parquet-go"
"io"
)
type ReaderAtSeeker interface {
io.ReaderAt
io.Seeker
}
// NewGenericReader returns a generic parquet reader. Wraps library New function because library
// function says it only needs ReaderAt, but really it also needs Seeker.
func NewGenericReader[T any](r ReaderAtSeeker, options ...parquet.ReaderOption) *parquet.GenericReader[T] {
return parquet.NewGenericReader[T](r, options...)
}
func FromReaderAtSeeker(r parquetx.ReaderAtSeeker) (*MemStore, error) {
pReader := parquetx.NewGenericReader[Row](r)
s := New(pReader.NumRows())
read := int64(0)
nRows := pReader.NumRows()
buf := make([]Row, parquetRowReadBatchSize)
for read < nRows {
num, err := pReader.Read(buf)
read += int64(num)
if err != nil && !errors.Is(err, io.EOF) {
return nil, err
}
// very important to avoid copy here
for i := 0; i < num; i++ {
err = addRow(s, &buf[i])
}
if err != nil {
return nil, err
}
}
if err := pReader.Close(); err != nil {
return nil, err
}
return s, nil
}
Thank you for sharing. I am able to reproduce this now and am doing some due diligence first in case there's a deeper level bug than the quick fix here.
Awesome thanks I will try it
https://github.com/segmentio/parquet-go/commit/219dfcfcd6f368b4467f45226885a9fc51e28f9c
Seems to have broken the reader code: getting index of range.
Thanks