segmentio / parquet-go

Go library to read/write Parquet files
https://pkg.go.dev/github.com/segmentio/parquet-go
Apache License 2.0
341 stars 102 forks source link

Handle io.EOF errors returned by ReadAt opening files #478

Open metalmatze opened 1 year ago

metalmatze commented 1 year ago

Hey,

we're opening parquet files with an object storage client and not directly via os.File. This currently fails.

The underlying ReaderAt interface is allowed to return io.EOF. Currently, opening a file fails for us because reading the footer the object storage client reads until the end of the file returning io.EOF yet at the same time the data was written to the buffer just fine.

Now we can fix this in our FrostDB code, however, we think that generally speaking FrostDB' object storage client is compliant with the ReaderAt interface.

What is this project's stance on handling the io.EOF here?

joe-elliott commented 1 year ago

This makes sense to me for reading the footer where you would likely get valid data and an EOF, but do we need this for the column and offset indexes?

achille-roussel commented 1 year ago

Maybe we can use a helper function to handle unexpected conditions?

func readFullAt(r io.ReaderAt, b []byte, off int64) (int, error) {
  n, err := r.ReadAt(b, off)
  if n == len(b) {
    err = nil
  } else {
    switch err {
    case nil:
      err = io.ErrNoProgress
    case io.EOF:
      err = io.ErrUnexpectedEOF
    }
  }
  return n, err
}
metalmatze commented 1 year ago

I'd be happy with such a helper function.

kevinburkesegment commented 1 year ago

Apologies to make more work for you, but we've decided to move development on this project to a new organization at https://github.com/parquet-go/parquet-go to ensure its long term success. We appreciate your contribution and would appreciate if you could reopen this PR there if it is still relevant.