utsaslab / WineFS

WineFS (SOSP 21): a huge-page aware file system for persistent memory
34 stars 2 forks source link

Possible crash consistency issue with write #3

Open hayley-leblanc opened 2 years ago

hayley-leblanc commented 2 years ago

NOTE: this original bug description is incorrect - see my comment below for the actual description :)

Hi Rohan,

I've found a potential crash consistency issue in WineFS that occurs if the system crashes while truncating a file. Here are steps to reproduce it:

  1. Simulate a crash in WineFS by adding two return statements to inode.c in the following locations. This will force the system call to exit early, simulating a crash at that point.
    • after the call to PERSISTENT_BARRIER() on line 2079
    • immediately after the call to pmfs_truncate_add() on line 2192
  2. Mount a fresh instance of WineFS at /mnt/pmem. I am running WineFS in strict mode; I have not checked if this behavior occurs in relaxed mode.
  3. Run a program that creates a new file in WineFS, writes 1 byte to it, writes 1024 bytes to it, then truncates it to a larger size (I'm doing 1494, which was chosen randomly by a fuzzer; I don't think the actual number matters). I have attached the program I am using: test3.zip

After running this program, I would expect the file to either have size 1024 or 1494, but running stat on the file says that it is 1025 bytes. I plan to look into the root cause of this issue soon and will add more info once I have an idea of what might be causing this.

Thanks!

hayley-leblanc commented 2 years ago

I looked into this issue a bit more and realized that I mischaracterized it originally; the "issue" I described is the expected behavior and not a bug (I was tired and had misinterpreted the output :) ) . However I believe there still is a crash consistency issue with the provided program, although it is actually an issue with write(), not truncate().

memcpy_to_nvmm() is implemented using __copy_from_user_inatomic_nocache() to write file data to PM. This function ultimately uses __copy_user_nocache(), defined here: https://github.com/utsaslab/WineFS/blob/main/Linux-5.1/arch/x86/lib/copy_user_64.S#L197 to perform the non-temporal memcpy. In the documentation for that function, it indicates that __copy_user_nocache() may use temporal memory moves if the destination address or size of the write is not 4 or 8-byte aligned (depending on the size of the entire write). WineFS uses a function pmfs_flush_edge_cachelines() to take care of the beginning and end of the write if they are handled using cached moves.

In the provided test3.cpp file, we write 1 byte, then 1024 bytes. The first write will require a cache line flush, since its size is < 4 bytes. The second write should require two cache line flushes, because although the length of the write is 8-byte aligned, its destination is not. After __copy_user_nocache() handles the first few unaligned bytes, the remaining number of bytes to write will no longer be a multiple of 8, so the final byte will be taken care of via a cached move.

If you add some print statements in pmfs_flush_edge_cachelines() in WineFS to indicate whether each of the if statements runs, and run the provided program, you should see that the first call to write() results in flushing the last cache line (which is expected) but the second call to write() only results in flushing the first cache line. This leaves the very last byte of the write un-flushed, and it could be lost in a crash.

rohankadekodi commented 2 years ago

Thanks! This would lead to data loss if the start and end portions of the data are not loaded in the same set and line. The fix for this issue is in #8