ligurio / unreliablefs

A FUSE-based fault injection filesystem.
https://ligurio.github.io/unreliablefs/unreliablefs.1.html
MIT License
177 stars 9 forks source link

Emulate Silent Data Corruptions #114

Open ligurio opened 2 years ago

ligurio commented 2 years ago

SiliFuzz: Fuzzing CPUs by proxy:

When discussing misbehaving CPUs we distinguish between logic bugs and electrical defects. A logic bug is an invalid CPU behavior inherent to a particular CPU microarchitecture or stepping, e.g. [2, 3, 4]. An electrical defect is an invalid CPU behavior that happens only on one or several chips. Often, but not always, an electrical defect affects only one of the physical cores on the chip. A defect may be present in a CPU from day one (if it was missed by the vendor during testing) or it may be a result of physical wear-out (e.g., circuit aging). For the purposes of this paper the distinction between day-one defects and wear-out is not important.

The term “Silent Data Corruption” (SDC) has been coined [5, 6, 7, 8] to represent CPU defects (and bugs) that do not cause an immediate observable failure, but instead silently lead to incorrect computation. This phenomenon has been known for a very long time, and is directly related to concepts like fault-secure operation, data integrity checking, and silent errors. In many production environments SDCs are more dangerous than crashes, because most server software ecosystems are built with tolerance to crashes, but not with tolerance to unobserved data corruption. Not all CPU defects induce SDCs, but many do.