Closed felinira closed 7 months ago
@rustbot claim
My proposal would be to check bounds of the underlying syscall results and return None for these broken time values if the ctime / mtime / atime etc. contains values that would otherwise trigger this assertion.
You mean that created return Err ? I agree, this shouldn't panic.
You mean that created return Err
Ah yeah right, this is a Result
not Option
Any news on this?
This should be a priority. Programs written in Rust simply crash without this fix.
I will see what I can do
@rustbot claim
I did a few work on this, there is few thing to consider, while remove the assert would remove the panic, the current proposition here say to return a error instead this mean that program will still need to handle the error, there is another possibility is to silent to error by rounding the value, like for example add extra time from nanosecond to second and so remove the weird nanosecond value.
I don't have strong opinion on this.
Adding the nanoseconds is probably not a very good idea. glibc defines tv_nsec as long int, which is 64 bits on amd64. That's still up to a Ā±292 years offset. If an error is undesirable I recommend setting it to 0 instead. In the case that only nsec is corrupted and not the seconds too this would at least keep the seconds correct.
In this case the seconds are corrupted too. 5006491383123099853 is well over a billion years in the future. I had to remove the last two digits to get date --date='@<the unix timestamp here>'
to not complain about an out of range error.
I think invalid data should be handled as an error and not interpreted.
I think invalid data should be handled as an error and not interpreted.
The solution depends on the situation at hand. In this case, if an error arises, the value should be set to zero. There's no need for rounding or interpretation - simply zero. Why? Because we're dealing with a computer language. This issue impacts all systems possessing files with invalid timestamps within the file system. Indeed, even the kernel reads these timestamps without triggering an error or causing a crash.
I think I prefer we return an error, for now I used io::ErrorKind::Other
but I could also use InvalidData
or create a new error kind, I didn't used invalid data cause it's doesn't really fit the error, we could create a invalid metadata error haha. Tell me what you think about what error we should return.
Then, the next problem is how does one solve the error using Rust ? Do we want std to include a way to "set" the creation time of a file ? is that even possible ? rename
the file seem to be the only way that I know of.
Also, one last thing, does anyone how a way I could add a test for this ?
I encountered the same issue on MacOS 13.5.1 (on APFS), and found it to be related to nanoseconds set to any value before unix epoch.
Obtaining the Metadata::modified()
time works on a file with nanoseconds created like this: touch -d "1970-01-01 00:00:00.1Z"
, but -1 seconds since unix epoch (touch -d "1969-12-31 23:59:59.1Z"
) panics with assertion failed: tv_nsec >= 0 && tv_nsec < NSEC_PER_SEC as i64', library/std/src/sys/unix/time.rs:77:9
.
However, touch -d "1969-12-31 23:59:59Z" this
works, note the lack of nanoseconds.
Interestingly, exa
can display these times without crashing - probably it uses a different way to obtain them, and so can stat
- all looks good there.
CC @joshtriplett
When a filesystem is returning invalid date data, producing an InvalidData error seems like a reasonable way to handle this, rather than panicking. That will let the program skip that piece of the file's information and move on. (And, in the case of a tool that isn't actually displaying that particular piece of information, that shouldn't affect the output.)
We should also document that this error can occur ("If the filesystem returns invalid timestamp information, this will return an InvalidData
error.").
Correction after some further investigation with @Byron: turns out the issue on macOS is entirely different than the Linux issue, and the issue on macOS can be easily corrected in std. Longer write-up shortly.
Full details on the macOS bug:
Time in UNIX system calls counts from the epoch, 1970-01-01. The timespec
struct used in various system calls represents this as a number of seconds and a number of nanoseconds. Nanoseconds are required to be between 0 and 999_999_999, because the portion outside that range should be represented in the seconds field; if nanoseconds were larger than 999_999_999, the seconds field should go up instead.
So, seconds=0 means 1970-01-01, seconds=86400 means 1970-01-02, etc.
Suppose you ask for the time 1969-12-31, what time is that? On UNIX systems that support times before the epoch, that's seconds=-86400, one day before the epoch. Perfectly reasonable so far.
But now, suppose you ask for the time 1969-12-31 23:59:00.1. In other words, a tenth of a second after one minute before the epoch.
On most UNIX systems, that's represented as seconds=-60, nanoseconds=100_000_000.
The macOS bug is that it returns seconds=-59, nanoseconds=-900_000_000.
I can see how they got that: it's 59.9 seconds before the epoch.
But that violates the invariant of the timespec data structure: nanoseconds must be between 0 and 999999999. Which, in turn, causes this assertion in the Rust standard library.
We can fix this, though: on macOS, if we get a Timespec value with seconds less than or equal to zero, and nanoseconds between -999_999_999 and -1 (inclusive), we can add 1_000_000_000 to the nanoseconds and subtract 1 from the seconds, and then convert.
The resulting timespec value is still accepted by macOS, and when fed back into the OS, produces the same results. (If you set a file's mtime with that timestamp, then read it back, you get back the one with negative nanoseconds again.)
Was there a consensus on how to proceed with this for non-Apple platforms? It seems to be cropping up fairly regularly in eza
and we'd love to not panic when a file in a directory listing has an invalid date!
@daviessm For non-macOS platforms, where the issue is a bad directory rather than an OS bug, I think we should be detecting the invalid timestamp and returning an InvalidData error rather than panicking. That should make it possible to handle the error and display a placeholder / error indication.
@joshtriplett I'd agree with that from the eza
perspective, but wouldn't that be a major breaking change in the function signature returning a Result<Timestamp>
instead of Timestamp
?
I think this would then have to mean that the error would have to be returned by one of the methods return Result<>
.
Maybe for even greater usability, a new constructor for Timestamp
could be added that is not panicking, but I don't know if this should then also be public or only be usable internally.
@joshtriplett I'd agree with that from the
eza
perspective, but wouldn't that be a major breaking change in the function signature returning aResult<Timestamp>
instead ofTimestamp
?
As @Byron noted, the three timestamp methods of Metadata return Result<SystemTime>
. So they could return an error for a bad timestamp.
Yes, sorry, I was thinking about the Timespec::new
method in sys/unix/time.rs
. If the fs
module handles erroneous timestamps at a higher level this won't occur.
As @Byron noted, the three timestamp methods of Metadata return
Option<SystemTime>
. So they could return an error for a bad timestamp.
They seem to return Result<SystemTime>
so proper error handling should be possible here.
I encourage everyone reading this to consider the issue a great opportunity for making a contribution to Rust, just judging from my experience fixing the bug on MacOS. It took only about 30 minutes to get into the red-green-refactor cycle, and from there it basically fixed itself.
The only problem I'd have implementing this is the lack of reproducible test-case - how would it be possible to even get a file with a broken filestamp? One would probably have to hex-edit the binary blob of an extracted filesystem to get there and then mount it.
Alternatively, maybe it's enough to not break an existing test when implementing the fix, along with a particularly thorough review of multiple parties.
The only problem I'd have implementing this is the lack of reproducible test-case - how would it be possible to even get a file with a broken filestamp? One would probably have to hex-edit the binary blob of an extracted filesystem to get there and then mount it.
Would FUSE work? Or does it do a sanity check of the timestamps? Or LD_PRELOAD
to replace the function that returns the invalid timestamp.
Just my piece of info that may be relevant:
ls -la --time=birth
shows me a list of directories and links and few directories have ?
in place of birthtime:
total 36
drwxr-xr-x 1 root root 198 Dec 5 2022 .
drwxr-xr-x 1 root root 198 Dec 5 2022 ..
drwxr-xr-x 1 nikel nikel 10 Jan 5 16:59 _
lrwxrwxrwx 1 root root 7 Sep 24 07:19 bin -> usr/bin
drwxr-xr-x 1 root root 8 Dec 5 2022 boot
drwxr-xr-x 19 root root 4160 Jan 5 16:28 dev
drwxr-xr-x 1 root root 0 Jan 6 2023 efi
drwxr-xr-x 1 root root 4310 Nov 24 1833 etc
drwxr-xr-x 1 root root 10 Dec 17 2021 home
drwxr-xr-x 19 root root 4096 Jan 6 2023 kaliroot
dr-xr-xr-x 1 root root 0 ? keybase
lrwxrwxrwx 1 root root 7 Sep 24 07:19 lib -> usr/lib
lrwxrwxrwx 1 root root 7 Sep 24 07:19 lib64 -> usr/lib
drwxr-xr-x 1 root root 24 Dec 5 2022 mnt
drwxr-xr-x 1 nikel nikel 94 Jan 2 21:55 nikel_big_downloads
drwxr-xr-x 1 root root 228 Dec 5 2022 opt
dr-xr-xr-x 492 root root 0 ? proc
drwxr-x--- 1 root root 400 Dec 5 2022 root
drwxr-xr-x 33 root root 860 Jan 5 16:28 run
lrwxrwxrwx 1 root root 7 Sep 24 07:19 sbin -> usr/bin
drwxr-xr-x 1 root root 14 Dec 5 2022 srv
dr-xr-xr-x 13 root root 0 ? sys
drwxrwxrwt 22 root root 540 Jan 5 16:28 tmp
drwxr-xr-x 1 root root 80 Dec 5 2022 usr
drwxr-xr-x 1 root root 126 Dec 5 2022 var
It either some sort of upstream/kernel/btrfs(which I'm using) bug or files/directories with such birthtimes are just normal
$ stat /sys
File: /sys
Size: 0 Blocks: 0 IO Block: 4096 directory
Device: 0,22 Inode: 1 Links: 13
Access: (0555/dr-xr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2024-01-05 16:29:45.450021521 +0300
Modify: 2024-01-05 16:29:45.450021521 +0300
Change: 2024-01-05 16:29:45.450021521 +0300
Birth: -
$ stat /proc
File: /proc
Size: 0 Blocks: 0 IO Block: 1024 directory
Device: 0,21 Inode: 1 Links: 511
Access: (0555/dr-xr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2024-01-05 16:28:43.860000003 +0300
Modify: 2024-01-05 16:28:43.860000003 +0300
Change: 2024-01-05 16:28:43.860000003 +0300
Birth: -
$ stat /keybase
File: /keybase
Size: 0 Blocks: 0 IO Block: 4096 directory
Device: 0,90 Inode: 1 Links: 1
Access: (0555/dr-xr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2024-01-05 17:33:52.083272769 +0300
Modify: 2024-01-05 17:33:52.083272769 +0300
Change: 2024-01-05 17:33:52.083272769 +0300
Birth: -
$ sudo mount | grep -E '(sys |proc |keybase )' | column -t
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
sys on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
keybase-redirector on /keybase type fuse (ro,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
PS. When eza attempts to work with those directories, rust panics, what is shown in the issue above my comment
Interesting part is that for me stat shows -
instead of any value like it did for 5006491383123099853.-1424152917
which @felinira reported
Interesting part is that for me stat shows - instead of any value like it did for 5006491383123099853.-1424152917 which @felinira reported
You are using GNU coreutils which shows -
when the nanosecond field of the birth time is negative, while @felinira uses macOS coreutils.
Edit: Looks like GNU coreutils uses get_stat_birthtime
to get the birthtime, which returns -1 in both the seconds and nanoseconds field if the file doesnt have a birth time at all like is the case for /sys
, /proc
and likely /keybase
.
while @felinira uses macOS coreutils
> stat --version
stat (GNU coreutils) 9.3
@bjorn3 why if it returns -1
it shows -
and not -1
?
Also l'm not sure is this true, but maybe we have different values because of the slight difference in versions (I'm using arch BTW lol).
@felinira do you in your macOS have /sys
and /proc
? And if you do could you please show stats for them?
$ stat --version
stat (GNU coreutils) 9.4
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Michael Meskes.
Also what do other linux people with or without btrfs have as their stat of /proc
and /sys
? And what kernel, filesystem of your root and distro do you have?
@bjorn3 why if it returns -1 it shows - and not -1?
Because get_stat_birthtime
is defined to return -1 when the birthtime is not available.
https://github.com/coreutils/coreutils/blob/a966dcdb69e2f49f2587e1b7d4ade7efcff29f40/src/stat.c#L1617 is the code that checks for negative values and prints -
if that is the case.
@felinira do you in your macOS
I don't use macos. I have never stated that I use macos. I don't understand where this is coming from. I use Linux with btrfs, which is where the corruption comes from.
But I don't see how this is relevant. The broken timestamps are there, and there will always be cases where those or similar problems will exist. This is not LKML and I would consider it quite offtopic to be debugging filesystem bugs here. The only thing we can do here is mitigate them.
I don't use macos. I have never stated that I use macos.
My bad. Must have gotten confused by the macOS mention at https://github.com/rust-lang/rust/issues/108277#issuecomment-1699045857.
Sorry, @felinira I was confused about macOS because @bjorn3 were confused about it š
I understand that this is not LKML, but i asked my question because all of the people that came here have something in common. And they are here. On github. Accessible. I'm just trying to get most likely possible reason why this bug (not that rust panics but the fact there are files with negative or broken birthtimes) happened before reporting it in place it belongs. And what's handy people that came here most likely have also those broken files/directories. If most of the people here also report for example that reading files in / fails because of the /sys /proc dirs this also may be an important detail to report to LKML rather than just shoot into the air with my bug report
I'm using Arch Linux with BTRFS and with nushell v0.89. When I run ls -la
in nushell it crashes because of this issue. I found out the offending file is my ~/.config
which returns this
stat ~/.config/ File: /home/user/.config/
Size: 3142 Blocks: 0 IO Block: 4096 directory
Device: 0,39 Inode: 4010908 Links: 1
Access: (0755/drwxr-xr-x) Uid: ( 1000/ user) Gid: ( 1001/ user)
Access: 2022-08-29 18:00:58.782805198 +0300
Modify: 2024-01-17 13:17:41.297652651 +0200
Change: 2024-01-17 13:17:41.297652651 +0200
Birth: 6012152045059794253.1935959397
All other files in my $HOME have a normal Birth time.
These are the fstab options with which I've mounted my /home
if it's at all related
rw,noatime,compress-force=zstd:3,space_cache=v2,subvol=home
Also running with the builtin ls ls -lad --time=birth ~/.config/
gives a similar result
drwxr-xr-x 1 user user 3142 6012152045059794253 /home/user/.config/
Here's a command to identify files/directories with bogus birth times:
sudo find "$HOME" -print0 \
| sudo xargs -0 ls -ld --time=birth \
| rg '^(\S+\s+){5}[0-9]+ /'
There are 69 on my machine. They seem to be concentrated in places that would see frequent metadata updates. SQLite databases and their -wal files, directories with frequent creation/deletion of files, etc.
The exception, is a cluster of files with names like /home/redacted/.mozilla/firefox/redacted/weave/changes/history.json-3.corrupt
. These all have zero size. Also, I run with noatime
on all filesystems, so atime kind of acts as a backup birth time, and as far as I can tell the cluster were all created and never modified. There is a cluster-within-the-cluster that dates to March 2022, with atimes all within the same 75 ms span. Then there is a pair of two from August 2022, 3.134 seconds apart.
Then outside the *.corrupt
cluster, there are some other atime clusters, one spanning 2021-11-27 to 2021-12-07, another on 2022-06-24 in an 11 minute span, and a few more. The most recent is a cluster of two pairs 7 hours apart on 2023-09-29/30, where the 7 hours looks like a likely sleep period.
Presumably there is some condition of system load that makes the corruption more likely, or particularly sensitive kernel versions. Maybe temperature, line voltage... Anyhow, this is how I got the atimes for the culprits:
ls -ltrhd --time=atime --time-style=full-iso (sed -E 's/^(\S+\s+){6}//' culprits.txt)
Just a hunch, but since birth time is a recent addition to the file attributes, maybe it's on the end of the buffer?
We discussed this in today's @rust-lang/libs meeting, and we agreed that we should just return an Err
here. We'd be happy to review a PR implementing that.
std::fs::Metadata
timestamp methods (likecreated
) will panic on files with invalid timestamp nsec values.`It will panic here:
thread 'async-std/runtime' panicked at 'assertion failed: tv_nsec >= 0 && tv_nsec < NSEC_PER_SEC as i64', library/std/src/sys/unix/time.rs:66:9
We have recently stumbled upon multiple btrfs users that happen to have files that look something like this:
This is most likely a filesystem / kernel bug. But these files / broken kernels exist now and will cause any access to
ctime
(and possible the other time values) to panic.My proposal would be to check bounds of the underlying syscall results and return an appropriate
Err
for these broken time values if the ctime / mtime / atime etc. contains values that would otherwise trigger this assertion.Meta
rustc --version --verbose
:Backtrace
``` 0: 0x556e2422048a - std::backtrace_rs::backtrace::libunwind::trace::h79937bc171ada62c at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5 1: 0x556e2422048a - std::backtrace_rs::backtrace::trace_unsynchronized::h2292bca8571cb919 at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5 2: 0x556e2422048a - std::sys_common::backtrace::_print_fmt::h9c461f248e4ae90d at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/std/src/sys_common/backtrace.rs:65:5 3: 0x556e2422048a -::fmt::he9fe6bf1a39182e1
at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/std/src/sys_common/backtrace.rs:44:22
4: 0x556e2424687e - core::fmt::write::h032658c119c720d7
at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/core/src/fmt/mod.rs:1208:17
5: 0x556e24219c65 - std::io::Write::write_fmt::h299fc90dfae41c0d
at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/std/src/io/mod.rs:1682:15
6: 0x556e24220255 - std::sys_common::backtrace::_print::heb70d25df9937e3f
at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/std/src/sys_common/backtrace.rs:47:5
7: 0x556e24220255 - std::sys_common::backtrace::print::had745c0a76b8b521
at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/std/src/sys_common/backtrace.rs:34:9
8: 0x556e24221cff - std::panicking::default_hook::{{closure}}::h1ea782cdfa2fd097
at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/std/src/panicking.rs:267:22
9: 0x556e24221a3b - std::panicking::default_hook::h1cc3af63455a163c
at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/std/src/panicking.rs:286:9
10: 0x556e2422240c - std::panicking::rust_panic_with_hook::h5cafdc4b3bfd5528
at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/std/src/panicking.rs:688:13
11: 0x556e24222162 - std::panicking::begin_panic_handler::{{closure}}::hf31c60f40775892c
at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/std/src/panicking.rs:577:13
12: 0x556e2422093c - std::sys_common::backtrace::__rust_end_short_backtrace::h28a5c7be595826cd
at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/std/src/sys_common/backtrace.rs:137:18
13: 0x556e24221eb2 - rust_begin_unwind
at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/std/src/panicking.rs:575:5
14: 0x556e237c2243 - core::panicking::panic_fmt::h8fa27a0b37dd98b7
at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/core/src/panicking.rs:64:14
15: 0x556e237c231d - core::panicking::panic::h545818946343732b
at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/core/src/panicking.rs:111:5
16: 0x556e2421704d - std::sys::unix::time::Timespec::new::h0e233601cbadebfd
at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/std/src/sys/unix/time.rs:66:9
17: 0x556e2421704d - std::sys::unix::time::SystemTime::new::h9d5242895ff4c7e2
at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/std/src/sys/unix/time.rs:29:25
18: 0x556e2421704d - std::sys::unix::fs::FileAttr::created::hb18250e52af180d4
at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/std/src/sys/unix/fs.rs:485:24
19: 0x556e2421704d - std::fs::Metadata::created::h80fcc7cb446092ab
at /rustc/d5a82bbd26e1ad8b7401f6a718a9c57c96905483/library/std/src/fs.rs:1311:9
```