This PR adds a new pio package, along with APIs to support performing parallel I/O on parquet files. The intent is to be able to amortize the cost of I/O latency when reading multiple file sections (e.g. when loading pages from multiple columns).
I am opening this PR against #297, it only is a building block that I intend to use to address performance issues when reading parquet files.
The key change is the introduction of pio.MultiReadAt(io.ReaderAt, []Op), which is the core API that the implementation will rely on. I also added platform specific implementations of this API, leveraging async I/O operations on Linux and Darwin, as well as a generic fallback mechanism using the Go runtime. Finally, I added an extension mechanism supported by implementation of the pio.ReaderAt interface, and a test suite in the pio/piotest package to validate the behavior of custom implementations of that interface.
This PR adds a new
pio
package, along with APIs to support performing parallel I/O on parquet files. The intent is to be able to amortize the cost of I/O latency when reading multiple file sections (e.g. when loading pages from multiple columns).I am opening this PR against #297, it only is a building block that I intend to use to address performance issues when reading parquet files.
The key change is the introduction of
pio.MultiReadAt(io.ReaderAt, []Op)
, which is the core API that the implementation will rely on. I also added platform specific implementations of this API, leveraging async I/O operations on Linux and Darwin, as well as a generic fallback mechanism using the Go runtime. Finally, I added an extension mechanism supported by implementation of thepio.ReaderAt
interface, and a test suite in thepio/piotest
package to validate the behavior of custom implementations of that interface.