oxidecomputer / propolis

VMM userspace for illumos bhyve
Mozilla Public License 2.0
176 stars 22 forks source link

Block backends need more flexible writeback cache configuration and support. #491

Open luqmana opened 1 year ago

luqmana commented 1 year ago

The file and crucible block backends essentially assume they'll only be used with a block device operating in writeback mode today. Completed Write operations make no guarantees about whether they've been persisted to the underlying storage with an expectation that subsequent Flush operations will be sent:

current backend behaviour

file

The file handle used to service requests is not opened with any sync flags (e.g. O_SYNC, O_DSYNC) but Flush commands will be handled via calls to fdatasync.

crucible

Crucible currently supports both flush and non-synchronous writes. It may also sometimes initiate Flush without a corresponding guest/device request but need to double check.

proposed

Today, the nvme device advertises that volatile write cache is enabled. virtio-block does not expose the similar flush capability (and with the backends assuming writeback it means it's susceptible to data loss; see #492).

Both nvme and virtio-block allow negotiating whether or not such modes are enabled but there's no way for the block devices to communicate that to the backends.

file

The file backend should be able to update its file handle to add/remove O_DSYNC via F_GETFL/F_SETFL.

crucible

Either crucible upstairs or the crucible backends should offer some type of combined Write+Flush operation to use with writeback disabled and the usual Write / Flush commands otherwise.

leftwo commented 1 year ago

Crucible: It may also sometimes initiate Flush without a corresponding guest/device request but need to double check.

Yes, crucible will issue a flush a regular intervals if it does not get one from the guest.