nanovms / nanos

A kernel designed to run one and only one application in a virtualized environment
https://nanos.org
Apache License 2.0
2.58k stars 133 forks source link

TFS: fix handling of storage operations on uninited extents #2017

Closed francescolavra closed 4 months ago

francescolavra commented 4 months ago

PR #1578 implemented deferred execution of storage operations for TFS file extents that are transitioning from uninited to initialized state. This creates a problem when a pagecache node I/O request spans multiple extents: if one extent is uninited and the next extent is initialized, (a portion of) the SG buffers destined to the first extent are written into the second extent, and vice versa. In order to properly assign SG buffers to extents, the SG list supplied in a given node I/O request must be consumed by each extent in the exact order in which the different extents cover the file range of the I/O request. If a storage operation for a given extent is deferred, this requirement may not be satisfied, which leads to corruption of file contents. This change fixes the above issue by removing deferred execution of storage operations: instead, read requests are executed with zero-filled buffers, and write requests are executed without delay (i.e. without waiting for extent initialization to complete) with the buffers supplied in the requests. This implementation relies on 2 assumptions: 1) disk drivers submit I/O requests to their peripherals in the same order as the requests are generated 2) if multiple write requests are ongoing simultaneously for a given address range, disk peripherals write data to the storage medium in the same order as the requests are submitted

The second assumption may not hold in general, because a write operation on a large address range may complete after an operation on a smaller range even if the former has been submitted before the latter; but since the kernel issues write requests using buffers whose minimum size is the page size in the page cache, and the write operations with zero-filled buffers that are submitted when an uninited extent begins transitioning to the initialized state use this minimum size (see the zero_blocks() function in the TFS code), it is reasonable to assume that write operations that fill a given file extent with zeros are executed by the disk peripheral before any subsequent write requests for the same extent, even if these subsequent requests are submitted while the zero-filling writes are still ongoing.