spiraldb / vortex

An extensible, state-of-the-art columnar file format
https://vortex.dev
Apache License 2.0
988 stars 27 forks source link

Better IO orchestration #964

Open robert3005 opened 1 month ago

robert3005 commented 1 month ago

Vortex reader will collect all read requests from layouts and dispatch them together https://github.com/spiraldb/vortex/blob/develop/vortex-serde/src/layouts/read/stream.rs#L192. However, this is extremely naive and doesn't leverage additional knoweldge we have about file format to prioritize requests and prefetch data.

AnyBlob outlines techniques for improving throughput and latency on blob stores. We should explore methods outlined in the paper and try to incorporate them into vortex file reading.

In no particular order the things to look at are

a10y commented 1 month ago

Fusio is almost a good fit for us as a replacement for object_store that is runtime agnostic, but it's missing two things

  1. Support for a ReadAt trait
  2. An implementation of the S3 client that runs on monoio

I'm working with fusio and monoio folks on both of these, since monoio affects it's implementation for (1).

Monoio: https://github.com/bytedance/monoio/pull/309

Fusio: https://github.com/tonbo-io/fusio/issues/68

ethe commented 1 month ago

I implement read_exact_at in fusio: https://github.com/tonbo-io/fusio/pull/71/files#diff-22e7f097629cff0a2a265b5185882031f09033e6e67d8bf2303e2c37314b1a7fL50 , please take a look