microsoft / CCF

Confidential Consortium Framework
https://microsoft.github.io/CCF/
Apache License 2.0
778 stars 211 forks source link

Async IO #3124

Open eddyashton opened 2 years ago

eddyashton commented 2 years ago

Our file IO currently blocks the host thread. This is triggered by ringbuffer messages from the enclave to the host, both for reads and writes. These are assumed to be quick operations, but they will completely block the host thread (main lib uv thread) while processing.

We need to make this asynchronous, to avoid stalls when the IO is slow. We need to ensure the file IO is still correctly ordered.

I think there are 2 extreme approaches:

We'd like to use the latter for optimal parallelism, and to avoid rebuilding similar infrastructure ourselves. But since we're unsure about the ordering guarantees of this approach, we'll start with something slower and simpler like the former.

While ledger_get and ledger_append operations are simple enough, anything that interacts with ledger ranges (in particular - fetching the body for an AppendEntries) is likely to be tricky to make async.

eddyashton commented 1 year ago

We should also look at batching IO operations on the host side, as I believe CCF services in Azure are occasionally hitting IOPS limits on the disk/VM, despite being nowhere near the throughput limit (ie - very small reads hitting caps and being stalled).

eddyashton commented 1 year ago

Another thing that would be interesting to add is some metrics on our IO. Are we doing a lot of Raft resends, what's the ratio of Raft headers to payload, how often is our host-side ledger file cache missing?