feat: Use object store and async, byte-range reads

bjchambers commented 1 year ago

Summary

Currently, compute uses the S3 client to retrieve Parquet files before reading them. We have started a transition to using https://docs.rs/object_store/latest/object_store/ which supports (a) reading from multiple object stores and (b) doing a direct byte-range read without fetching the file locally first.

We should finish up this migration to fully benefit from from object_store.

[ ] Call abort_multipart on failure after failures
[ ] #505
[ ] Cleanup: Pass object store URLs as ObjectStoreUrl rather than &str or String
[ ] Consider how we keep the object stores (and credentials) separate in multi tenant case.
[x] For reading files during compute (#471)
[x] For writing files during prepare (#475)
[x] For writing metadata flies during prepare (#475)
[x] For reading metadata files during compute (#476)
[x] For determining file schemas (#479)
[x] For writing CSV files during compute (moved to #486)
[x] For writing Parquet files during compute (#492)
[x] For reading files during prepare (rather than copying to disk) (#495)
[x] Make the key method and ObjectStoreCrate private (#501)
[ ] For reading (or at least fetching) the incremental checkpoint (rocksdb) (#503)
[ ] For writing (or at least uploading) the incremental checkpoint (rocksdb) (#503)
[ ] For uploading the plan yaml and flight records (#503)
[ ] Remove s3 helper and s3 crates (#503)
[ ] Delete ConvertURI methods (https://github.com/kaskada-ai/kaskada/blob/main/wren/compute/helpers.go#L19-L23) (#503)

bjchambers commented 1 year ago

Some of this may be done as part of building the new partitioned execution logic (as part of #409).

epinzur commented 1 year ago

I believe this work is complete for the getMetadata() and prepareData() methods, but still needs to be completed on the query execution and materialization code paths.

bjchambers commented 1 year ago

In general -- started working on this to allow operating on many and/or large files without filling up the disk. First PR(s) are ready for review.

@epinzur re getMetadat() and prepareData() -- it isn't really complete for them either. Specifically, they still rely on downloading the whole file. For get metadata, we should only need to fetch the bytes corresponding to the footer, for prepare we should be able to use object store to read bytes in chunks, never fetching the whole thing. Similarly, it isn't used for uploading the file yet.

bjchambers commented 1 year ago

Capturing some links / thoughts:

Example of getting the minimum/maximum time from the parquet metadata (the file stats): https://github.com/kaskada-ai/kaskada/blob/7858a62bc26c4ffd2451336d6d4dee82bd393fab/crates/sparrow-runtime/src/metadata/prepared_metadata.rs#L45
Fetching the schema is likely done by https://github.com/kaskada-ai/kaskada/blob/main/crates/sparrow-runtime/src/metadata/raw_metadata.rs

kaskada-ai / kaskada

feat: Use object store and async, byte-range reads #465