llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.62k stars 11.83k forks source link

std::experimental::filesystem::file_time_type::min() exceeds values supported by the underlying file system #35338

Open vsapsai opened 6 years ago

vsapsai commented 6 years ago
Bugzilla Link 35990
Version unspecified
OS All
Attachments Code to test with low-level API.
CC @mclow,@jwakely

Extended Description

std::experimental::filesystem::file_time_type::min() exceeds values supported by the underlying file system. When you set file modification time with filesystem::last_write_time(path, time) with min value and read it back with filesystem::last_write_time(path), received value can be significantly different from the supplied one.

Steps to reproduce:

  1. Call auto min_time = std::experimental::filesystem::file_time_type::min(); std::experimental::filesystem::last_write_time(path, min_time);
  2. Call auto current_time = std::experimental::filesystem::last_write_time(path);
  3. Reboot machine you are running the test on.
  4. Call auto current_time2 = std::experimental::filesystem::last_write_time(path);

Expected result: min_time, current_time, current_time2 should be equal, at least to the second.

Actual result: Results depend on the underlying file system.

On macOS with APFS min_time != current_time, current_time == current_time2. Inequality is detected by the test std/experimental/filesystem/fs.op.funcs/fs.op.last_write_time/last_write_time.pass.cpp

On Ubuntu 16.04.3 with ext3 min_time == current_time, current_time != current_time2.

Details: For writing and reading file modification timestamp we are using utimensat and stat, respectively. In my testing filesystem::file_time_type::min() was -2^63 microseconds which can be expressed with struct timespec which has 64-bit tv_sec and 64-bit tv_nsec. But file systems I tested don't use 128 bits to store timestamps on the disk. One common approach seems to be using 128-bit value while inode metadata is still in memory. That allows the test last_write_time.pass.cpp to pass on ext3 and probably ext4. But according to POSIX.1-2008 http://pubs.opengroup.org/onlinepubs/9699919799/ that is not conformant

Upon assignment, file timestamps are immediately converted to the resolution of the file system by truncation (i.e., the recorded time can be older than the actual time). For example, if the file system resolution is 1 microsecond, then a conforming stat() must always return an st_mtim.tv_nsec that is a multiple of 1000. Some older implementations returned higher-resolution timestamps while the inode information was cached, and then spontaneously truncated the tv_nsec fields when they were stored to and retrieved from disk, but this behavior does not conform.

  • Section 11. Headers, <sys/stat.h>

Regardless of the POSIX standard, I think the developers expect that values in file_time_type::min()/max() range are safe to use and won't change on their own.

I have attached testing.c to test utimensat and stat directly and the results are On APFS: Will update file mtime to tv_sec = -9223372036854, tv_nsec = 0 File mtime now is tv_sec = -9223372036, tv_nsec = -854775808

On ext3: Will update file mtime to tv_sec = -9223372036854, tv_nsec = 0 File mtime now is tv_sec = -9223372036854, tv_nsec = 0

after rebooting File mtime now is tv_sec = 2217714954, tv_nsec = 0

3a17e7c4-fca4-4827-b2a7-11a54d73d746 commented 6 years ago

It's not even clear that it's possible to know the correct answer to these problems at build time. Since the FS could change at any point.

Right, and there's only one file_time_type for the whole C++ program, but it gets used for all the filesystems connected to the machine, which could support very different resolutions and very different min/max values.

So in general the range of valid values for the file_time_type can't be the same as the range of value values for "the filesystem", because there could be several different filesystems mounted on a machine.

3a17e7c4-fca4-4827-b2a7-11a54d73d746 commented 6 years ago

Regardless of the POSIX standard, I think the developers expect that values in file_time_type::min()/max() range are safe to use and won't change on their own.

I don't think that's a reasonable expectation, and nothing in the C++ standard supports that expectation.

The integer type used as the file_time_type representation might be able to represent -2^63 microseconds but if the physical disks connected to the machine don't allow that value to be written out and read back in (whether immediately or after a reboot) then there's nothing the C++ library can do about it.

And if the kernel allows a value to be written, without reporting an error, but the value isn't there after a reboot (or after it's removed from the disk cache) then there's nothing the C++ library can do. Obviously the C++ library isn't going to force a refresh of the entire disk cache just to ensure a file mtime value is what the program expects.

All the library can do is try to write the value using futimens and report if that succeeds or not.

llvmbot commented 6 years ago

It's not even clear that it's possible to know the correct answer to these problems at build time. Since the FS could change at any point.

For that reason, I'm actually strongly in favor of using the largest possible file_time_type representation. It's much better for us to be able to handle all valid FS timestamps, as opposed to the opposite, where we can't represent all of the times represented by the FS.

vsapsai commented 6 years ago

For Darwin I tried

typedef chrono::time_point<std::chrono::system_clock, std::chrono::nanoseconds>  file_time_type;

But it caused static assertions during the build and I didn't check what is the correct way of expressing this approach and if it actually fixes the issue.

I agree the difference between in-memory and on-disk timestamp representation is the kernel/filesystem quirk. But I think it is useful for std::filesystem to abstract these peculiarities instead of propagating them to its clients.

llvmbot commented 6 years ago

Also, the fact that the time changes after rebooting seems like an issue with the kernel/filesystem. Not libc++. If the time is successfully set, and can be retrieved then it seems like that should work. Regardless of reboot behavior.

llvmbot commented 6 years ago

This is a know issue, which I've spent a lot of time on and it's not clear how to fix it.