spiraldb / vortex

An extensible, state-of-the-art columnar file format
https://vortex.dev
Apache License 2.0
868 stars 21 forks source link

Better IO orchestration #964

Open robert3005 opened 3 weeks ago

robert3005 commented 3 weeks ago

Vortex reader will collect all read requests from layouts and dispatch them together https://github.com/spiraldb/vortex/blob/develop/vortex-serde/src/layouts/read/stream.rs#L192. However, this is extremely naive and doesn't leverage additional knoweldge we have about file format to prioritize requests and prefetch data.

AnyBlob outlines techniques for improving throughput and latency on blob stores. We should explore methods outlined in the paper and try to incorporate them into vortex file reading.

In no particular order the things to look at are

a10y commented 2 days ago

Fusio is almost a good fit for us as a replacement for object_store that is runtime agnostic, but it's missing two things

  1. Support for a ReadAt trait
  2. An implementation of the S3 client that runs on monoio

I'm working with fusio and monoio folks on both of these, since monoio affects it's implementation for (1).

Monoio: https://github.com/bytedance/monoio/pull/309

Fusio: https://github.com/tonbo-io/fusio/issues/68

ethe commented 9 hours ago

I implement read_exact_at in fusio: https://github.com/tonbo-io/fusio/pull/71/files#diff-22e7f097629cff0a2a265b5185882031f09033e6e67d8bf2303e2c37314b1a7fL50 , please take a look