lancedb / lance

Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..
https://lancedb.github.io/lance/
Apache License 2.0
3.92k stars 215 forks source link

Decouple parallelism from cpu count for IO bound operations #1357

Open changhiskhan opened 1 year ago

changhiskhan commented 1 year ago

This will result in bad performance on low cpu-count nodes. Ideally configurable but at the very least this should be some multiple of cpu-count?

wjones127 commented 1 year ago

duplicate of https://github.com/lancedb/lance/issues/832?

wjones127 commented 1 year ago

I think there's something else going on too. We see this odd pattern where requests are being made in waves and seemingly not as quickly as they could be. For example, here I set readahead to 32, and it makes 32 concurrent requests, but waits for the entire initial wave to complete before starting the request.

Screenshot 2023-10-05 at 7 52 46 AM

My current theory is this has to do with misuse of buffered. There is a GH issue describing a footgun here, part of which is familiar: https://github.com/rust-lang/futures-rs/issues/2387

Basically, one of those Buffered streams needs to be continuously polled to make progress. But if once you get a result you do a bunch of CPU work on it, you aren't going to be polling the stream during that time, causing a delay until the next set of futures are started.

We should double check that (1) we are spawning tasks in the scanner, so they progress independently, and (2) maybe add a buffer that the stream continuously pushes into, so that it's poller isn't doing anything that might cause a delay in starting the next future.