ligurio / unreliablefs

A FUSE-based fault injection filesystem.
https://ligurio.github.io/unreliablefs/unreliablefs.1.html
MIT License
177 stars 9 forks source link

Fault injection with replace operation with no-op #18

Closed ligurio closed 3 years ago

ligurio commented 3 years ago

Like https://github.com/stewartsmith/libeatmydata

replace operation with no-op for those operations where it possible

Implementation https://github.com/stewartsmith/libeatmydata/blob/master/libeatmydata/libeatmydata.c

ligurio commented 3 years ago

3.1. Disk drives that do not honor sync requests Unfortunately, most consumer-grade mass storage devices lie about syncing. Disk drives will report that content is safely on persistent media as soon as it reaches the track buffer and before actually being written to oxide. This makes the disk drives seem to operate faster (which is vitally important to the manufacturer so that they can show good benchmark numbers in trade magazines). And in fairness, the lie normally causes no harm, as long as there is no power loss or hard reset prior to the track buffer actually being written to oxide. But if a power loss or hard reset does occur, and if that results in content that was written after a sync reaching oxide while content written before the sync is still in a track buffer, then database corruption can occur.

USB flash memory sticks seem to be especially pernicious liars regarding sync requests. One can easily see this by committing a large transaction to an SQLite database on a USB memory stick. The COMMIT command will return relatively quickly, indicating that the memory stick has told the operating system and the operating system has told SQLite that all content is safely in persistent storage, and yet the LED on the end of the memory stick will continue flashing for several more seconds. Pulling out the memory stick while the LED is still flashing will frequently result in database corruption.

Note that SQLite must believe whatever the operating system and hardware tell it about the status of sync requests. There is no way for SQLite to detect that either is lying and that writes might be occurring out-of-order. However, SQLite in WAL mode is far more forgiving of out-of-order writes than in the default rollback journal modes. In WAL mode, the only time that a failed sync operation can cause database corruption is during a checkpoint operation. A sync failure during a COMMIT might result in loss of durability but not in a corrupt database file. Hence, one line of defense against database corruption due to failed sync operations is to use SQLite in WAL mode and to checkpoint as infrequently as possible.

https://www.sqlite.org/howtocorrupt.html