oxidecomputer / crucible

A storage service.
Mozilla Public License 2.0
175 stars 18 forks source link

Fix write reordering bug #1448

Closed mkeeter closed 2 months ago

mkeeter commented 2 months ago

If we have any block operations in the deferred queue (i.e. because we're encrypting a large write in the thread pool), then every block operation must go through the queue to preserve ordering.

Unfortunately, we were checking the wrong DeferredQueue when checking whether to defer writes. This means that a large (deferred) write followed by a short (non-deferred) write could be reordered so that the short write happens first.

This is probably the root cause of CI failures that we saw in #1445: fast-ack means that the Guest is able to send the short write with less delay, making it more likely to hit this race condition.