revisit flush_timeout - Githubissues

Background:

Upstairs periodically sends flush commands to Downstairs, even if the guest is not asking for disk flushes. Originally this was every 5 seconds. I believe, though am not sure, that it was added to prevent jobs piling up in the upstairs queue.

We adjusted this to 0.5 seconds back when we were still using the sqlite: https://github.com/oxidecomputer/crucible/blob/8757b3fb55a1763382ad7111144bc60ec72af23d/upstairs/src/lib.rs#L9675

We adjusted to 0.5 because, at the time, the cost of an extent flush required a lot of work clearing old metadata contexts out of the sqlite database. That work scaled up directly with the number of writes that had hit an extent since the last flush, and was causing some pretty terrible latency bubbles that we wanted to avoid. Bryan did some testing, found 0.5 gave the best results ( https://github.com/oxidecomputer/crucible/issues/757 ). Note that this was also pre fast-write-ack.

And it remains 0.5 to this day:

https://github.com/oxidecomputer/crucible/blob/7d6c7e1e71d0b389999be06515db855bf273989e/upstairs/src/upstairs.rs#L396

This is configurable in the VCR, but Nexus is written to always pass None and accept our default.

Why we might want to change it

Well for one thing, we are not on the sqlite backend anymore, and our new one has different flush performance characteristics. But also, we may be sending more fsyncs to zfs than the guest actually cares about. That has a cost to it.

Questions

Does adjusting the flush timeout affect performance or CPU usage?
If it is also still serving the role of preventing job pileup, is it still the right mechanism for that?

oxidecomputer / crucible

revisit flush_timeout #1358

Background:

Why we might want to change it

Questions