utsaslab / WineFS

WineFS (SOSP 21): a huge-page aware file system for persistent memory
34 stars 2 forks source link

Write may not be atomic with respect to crashes in strict mode #5

Open hayley-leblanc opened 2 years ago

hayley-leblanc commented 2 years ago

Hi Rohan,

I think I've found a situation where writes may not occur atomically with respect to crashes even in strict mode. Here are the steps to reproduce the issue:

  1. Add the following at line 809 in xip.c:
    if (count == 1024) {
    return written;
    }

    This will emulate a crash occuring after calling __pmfs_xip_file_write() but before committing the transaction when performing a write of 1024 bytes. The bug requires two writes to manifest, so making it conditional on the write size will make sure we don't emulate the crash too early.

  2. Mount WineFS with mount -t winefs -o init,strict /dev/pmem0 /mnt/pmem.
  3. Run the following program: test6.zip. This just creates a file /mnt/pmem/file0, writes 4096 bytes of 'a' to it, then overwrites the first 1024 bytes with 'b'.
  4. Use dd to copy out the contents of /dev/pmem0 to a separate file, unmount WineFS, recopy the contents of the file, and remount. This ensures that we go through recovery code.

After doing these steps, when I do cat /mnt/pmem/file0, I see that the first 1024 bytes have been overwritten with 'b'. This seems like incorrect behavior, since WineFS is being used in strict mode and the transaction for the write was not committed before the crash. I would expect the file to still be all 'a's.

Let me know what you think. Thanks!

hayley-leblanc commented 2 years ago

Adding a higher-level overview to explain why I think this behavior is incorrect: my understanding of WineFS's strict mode is that if the system crashes before the transaction in pmfs_xip_file_write() is committed, the entire data write should be rolled back during recovery. In the provided steps, we are emulating a crash before this transaction's commit block is written during the second write(), so I expect the contents of the 1024-byte write to not be present in the crash state.

I initially discovered this bug using our crash consistency testing tool, which constructs some crash states in which only a portion of the written data is persisted before a crash. I'm seeing consistency checks on these tests fail because the partial write is present after the crash. However this is harder to make happen without the tool than just injecting a crash between the full data write and the transaction commit :)

rohankadekodi commented 2 years ago

Thanks! This is indeed a crash consistency bug that breaks the atomicity of writes in the strict mode of WineFS. In strict mode, the root block of the file is not being copy-on-written on a file update when the size of the file is <= 4KB, causing write atomicity issues in the strict mode. The fix for the issue is being handled in #6 .