koverstreet / bcachefs

Other
686 stars 71 forks source link

Document behavior of `-o degraded` when split-brain is possible #579

Open DemiMarie opened 1 year ago

DemiMarie commented 1 year ago

The documentation of -o degraded states that it will work so long as no data is missing. However, this can cause data inconsistency! Suppose that I have an empty bcachefs filesystem with 2-way mirroring on disks nvme0n1 and nvme1n1. Then the following happens:

  1. I mount nvme0n1 read-write with -o degraded at (say) /mnt.
  2. echo X > /mnt/a
  3. Unmount /mnt.
  4. Mount nvme1n1 read-write with -o degraded at /mnt.
  5. echo Y > /mnt/a.
  6. Unmount /mnt.
  7. Mount nvme0n1 and nvme1n1 together at /mnt.
  8. cat /mnt/a.

What happens at step 8? There is no valid value for bcachefs to return, because the writes at steps 2 and 5 conflict with each other. Therefore, this must be prevented.

The only way I know to prevent this is to ensure that a quorum (strict majority) of devices for -o degraded to mount read/write. If a quorum is not present, -o degraded should mount read-only. If this is the existing behavior, it should be documented; otherwise, this is a request that bcachefs adopt this behavior.