readysettech / readyset

Readyset is a MySQL and Postgres wire-compatible caching layer that sits in front of existing databases to speed up queries and horizontally scale read throughput. Under the hood, ReadySet caches the results of cached select statements and incrementally updates these results over time as the underlying data changes.
https://readyset.io
Other
4.54k stars 125 forks source link

Run domains in a threadpool #1218

Open gvsg-rs opened 7 months ago

gvsg-rs commented 7 months ago

Previously, we used tokio::task::block_in_place to run domains, which blocks the thread it's currently running on until the task completes. This prevents the executor from using that thread to make progress on any other tasks, which is not efficient. CL-1268 removed the block_in_place strategy by spawning a native OS thread for each domain instead, which is how the system works today. This is highly resource-inefficient and has also led to us hitting the upper limits on the number of threads with active tracing spans (4096).

A quote from that CL:

The thing is spawn_blocking spawns a *blocking* task on a *blocking thread*, whereas our Replica is actually asynchronous, so it would not work at all. Moreover the blocking tasks run in a thread pool that has a limited size, and we don't know a-priory how high to set it. It defaults to 512, but there is no reason for us not to have more domains, and once we run out, spawning stops. Spawn blocking performs even worse than block_in_place BTW.

Generally speaking, blocking I/O bound work is very well-suited to tokio's built-in blocking threadpool. Further, it is now possible to configure the size of the blocking threadpool using the max_blocking_threads method on the runtime builder. We should re-investigate the performance of tokio's spawn_blocking method in the context of domains.

For work that is CPU-bound, we should consider using the rayon crate, which is typically the go-to tool for spawning blocking CPU-bound tasks.

altmannmarcelo commented 7 months ago

If we run against a setup that has a high number of tables, we can exceed the limit of threads and Readyset will panic:

Apr 11 17:20:35 ip-10-0-5-246 readyset[2686075]: Thread count overflowed the configured max count. Thread index = 4097, max threads = 4096.
Apr 11 17:20:35 ip-10-0-5-246 readyset[2686075]: thread 'Domain 1493.0.0' panicked at readyset-dataflow/src/domain/mod.rs:732:10:
Apr 11 17:20:35 ip-10-0-5-246 readyset[2686075]: called `Result::unwrap()` on an `Err` value: JoinError::Panic(Id(6925), ...)

A workaround is to limit the number of tables either via: