nyx-space / hifitime

A high fidelity time management library in Rust
Mozilla Public License 2.0
344 stars 19 forks source link

Incorrect leap second handling when converting to UTC #255

Open Cognoscan opened 1 year ago

Cognoscan commented 1 year ago

If I try to do a round-trip conversion from UTC seconds to an Epoch and back, I expect it to be perfect at every point except in the case of a deleted leap second (where the skipped UTC second count wouldn't be able to round-trip correctly). What I actually get is a delta of +/-1 second leading up to the leap second insertion/removal point. See below for an example using toy leap second updates.

Insert leap second @ 100, from +3 to +4
UTC -> TAI -> UTC
 96 ->  99 ->  96, delta = 0
 97 -> 100 ->  96, delta = -1
 98 -> 101 ->  97, delta = -1
 99 -> 102 ->  98, delta = -1
    -> 103 ->  99, delta = 0
100 -> 104 -> 100, delta = 0
101 -> 105 -> 101, delta = 0
102 -> 106 -> 102, delta = 0

Delete leap second @ 100, from +3 to +2
UTC -> TAI -> UTC
 96 ->  99 ->  96, delta = 0
 97 -> 100 ->  98, delta = 1
 98 -> 101 ->  99, delta = 1
 99 -> 102 -> 100, delta = 1
100 -> 102 -> 100, delta = 0
101 -> 103 -> 101, delta = 0
102 -> 104 -> 102, delta = 0

The reason for this LeapSecondProvider is used the same way for conversion both from UTC to TAI, and from TAI to UTC. The IERS leap second table lists times in UTC seconds, so the conversion to TAI is fine (and is exactly what IERS made the table for). To go in the other direction, we'd need to speculatively subtract the offset first and then examine the leap second table's seconds count.

I'm not sure what the preferred fix is here. Epoch has several functions for getting the number of leap seconds for the epoch, so it seems like the preference is for leap seconds tables to actually list themselves with TAI times instead of UTC times, and to go through the "offset-then-compare" leap second calculation when going from UTC to TAI. That would require all LeapSecondProvider implementers to change though.

The alternate would be to keep the leap second implementers the same, and to change the leap_seconds_with function to apply the offset before doing the comparison. In either case, there would need to be two functions: one to apply leap seconds to TAI, and one to remove leap seconds from UTC.

Also, not directly related to the problem, but is there a way to avoid creating a new leap second table every time I want to use one? It looks like the current implementations will, at minimum, need to clone or create the entire table every time the table is needed. A naive use of the LeapSecondsFile would cause the file to be re-parsed every time there's a leap second calculation. Maybe there's a good reason for the trait, but it does seem like a lot of needlessly duplicated work.

Example Code with the proposed function fix: hifitime_play.zip

ChristopherRabotin commented 1 year ago

Thank you for the thorough bug report and proposed fix. We're in the process of changing the architecture of how time systems are handled in hifitime, which I think would hide the issue you've found but not fix it (the draft pull request is here: #241).

I've added this to the 4.0.0 milestone and it'll be fixed before that release. I can't realistically propose a firm release date for this, but I'm hopeful it'll be in the next three months. Would that time line work for your use case of hifitime ?

Concerning the LeapSecondProvider trait, the crux of the issue is that the leap seconds are stored in a Vec<_> and that isn't copiable, and the double ended iterator, requires the provided structure to be mutable to call iter. One fix I can think of is to pass a reference to the leap second provider in the functions that require it, clone the provider in there, and use the clone for the iteration.

Cognoscan commented 1 year ago

Thanks for the quick response! I don't have any immediate need for this fix; I was just surveying Rust crates that do leap second handling and noticed the flaw while browsing the code.

I could see a few different ways to make the LeapSecondProvider trait and leap second handling in general a bit easier to work with:

You could do any of the above and go a step further: put the leap second table inside a OnceLock<RwLock<Vec<LeapSecond>>>, which would let hifitime store a global static thread-safe lookup table that can be infrequently updated. That way every hifitime leap second lookup will use the same table, and you could have a function that the user could call to update the global table. This is the approach I'm leaning towards for my own leap-second handling, as I'd prefer the user to think about leap seconds as infrequently as possible. The table would be initialized with a compiled-in table (like LatestLeapSeconds) and could be updated with functions like update_leap_seconds_from_file<P>(path: P) or update_leap_seconds(v: Vec<LeapSecond>).

...I probably should've put this in a different issue, but wasn't sure if there was already a compelling (Python-related?) reason for why it is the way it is.

ChristopherRabotin commented 1 year ago

This is a fantastic response, and I really like the OnceLock<_> idea, thank you !

ChristopherRabotin commented 6 months ago

@Cognoscan , a few weeks ago BurntSushi pointed out in https://github.com/nyx-space/hifitime/discussions/292 that the interpretation of UTC is wrong. I'm currently tackling a bunch of tasks in hifitime trying to push 4.0.0-alpha out the door with all of the bugs discovered in the last few months, including the bug you've reported.

I'm tagging that discussion in this issue so that when I fixed it (soon I hope), then I make sure to add the round trip test along with a test that ensures that initializing a UTC epoch at the 60th second on a leap second day is displayed with that 60th second again. Does that kind of test also make sense to you in addition to the round trip test and fixes you've proposed?

Thanks for your feedback

ChristopherRabotin commented 4 months ago

Just as a quick status update, I hope to work on this this weekend so I can release hifitime version 4.0.0-alpha by the end of the weekend.

Cognoscan commented 4 months ago

Great, I'll definitely take a look when you've got an alpha out! I think the tests you talked about adding should match up with the tests I'd proposed, but I'll also check the implementation regardless.

ChristopherRabotin commented 4 months ago

One of the main changes of version 4 is that epochs are lazy and the duration is stored in the time scale that it was initialized in, and the conversion only happens when requested (e.g. calling my_epoch.to_time_scale()). In version 3, every epoch was converted to its TAI equivalent on initialization.

When working on your ticket, it was clear that v4 was missing a function to convert between time scales using a specific leap second provider. As you recommended, I changed the LeapSecond provider so that it does not need to implement an iterator or an indexing, and instead it just needs to expose an array of LeapSecond entries (as this implements the double ended iterator and indexing directly).

I was able to update your test case to meet the behavior you encountered: in other words, this bug still existed in v4! So I'm glad I'm addressing it.

Here is how I implemented this test (this will eventually be full of asserts instead of prints):

#[cfg(feature = "std")]
#[test]
fn regression_test_gh_255() {
    use hifitime::leap_seconds::{LeapSecond, LeapSecondProvider};

    #[derive(Clone)]
    struct LeapTable {
        table: Vec<LeapSecond>,
    }

    impl LeapSecondProvider for LeapTable {
        fn entries(&self) -> &[LeapSecond] {
            &self.table
        }
    }

    let leap_insert = LeapTable {
        table: vec![
            LeapSecond::new(50.0, 3.0, true),
            LeapSecond::new(100.0, 4.0, true),
        ],
    };
    let leap_delete = LeapTable {
        table: vec![
            LeapSecond::new(50.0, 3.0, true),
            LeapSecond::new(100.0, 2.0, true),
        ],
    };

    println!();
    println!("Insert leap second @ 100, from +3 to +4");
    println!("UTC -> TAI -> UTC");
    for i in 96..=102 {
        let time = Epoch::from_utc_seconds(i as f64);
        let tai = time.to_time_scale_with_leap_seconds(TimeScale::TAI, &leap_insert);
        let tai_s = tai.duration.to_seconds().round() as i64;
        let utc = tai.to_time_scale_with_leap_seconds(TimeScale::UTC, &leap_insert);
        let utc_s = utc.duration.to_seconds().round() as i64;
        println!(
            "{:3} -> {:3} -> {:3}, delta = {}",
            i,
            tai_s,
            utc_s,
            utc_s - i
        );
        if i == 99 {
            let time = Epoch::from_tai_seconds(103.0);
            let utc = time.to_time_scale_with_leap_seconds(TimeScale::UTC, &leap_insert);
            let utc_s = utc.duration.to_seconds().round() as i64;
            println!("    -> {:3} -> {:3}, delta = {}", 103, utc_s, 0);
        }
    }

    println!();
    println!("Delete leap second @ 100, from +3 to +2");
    println!("UTC -> TAI -> UTC");
    for i in 96..=102 {
        let time = Epoch::from_utc_seconds(i as f64);
        let tai = time.to_time_scale_with_leap_seconds(TimeScale::TAI, &leap_delete);
        let tai_s = tai.duration.to_seconds().round() as i64;
        let utc = tai.to_time_scale_with_leap_seconds(TimeScale::UTC, &leap_delete);
        let utc_s = utc.duration.to_seconds().round() as i64;
        println!(
            "{:3} -> {:3} -> {:3}, delta = {}",
            i,
            tai_s,
            utc_s,
            utc_s - i
        );
    }
}

Since all of the time scale conversion is centralized in the to_time_scale_with_leap_seconds function, I implemented your fix only in the part of that function that converts from TAI to UTC. The to_time_scale function still converts from the initialization time scale into TAI, and then back to the desired output time scale. Lines 226-236 are as follows:

let original_offset_s = prime_epoch_offset.to_seconds();
let mut corrected_offset = original_offset_s * Unit::Second;
for leap_second in provider.entries().iter().rev() {
    let maybe_corrected_offset_s = original_offset_s - leap_second.delta_at;
    if original_offset_s >= leap_second.timestamp_tai_s {
        corrected_offset = maybe_corrected_offset_s * Unit::Second;
        break;
    }
}

corrected_offset

But that didn't seem to work: the test still shows the same behavior. It's pretty late where I am, so I assume that I incorrectly implemented something and I'm not seeing what's different between my implementation and yours that I tried to implement.

---- regression_test_gh_255 stdout ----

Insert leap second @ 100, from +3 to +4
UTC -> TAI -> UTC
 96 ->  99 ->  96, delta = 0
 97 -> 100 ->  96, delta = -1
 98 -> 101 ->  97, delta = -1
 99 -> 102 ->  98, delta = -1
    -> 103 ->  99, delta = 0
100 -> 104 -> 100, delta = 0
101 -> 105 -> 101, delta = 0
102 -> 106 -> 102, delta = 0

Delete leap second @ 100, from +3 to +2
UTC -> TAI -> UTC
 96 ->  99 ->  96, delta = 0
 97 -> 100 ->  98, delta = 1
 98 -> 101 ->  99, delta = 1
 99 -> 102 -> 100, delta = 1
100 -> 102 -> 100, delta = 0
101 -> 103 -> 101, delta = 0
102 -> 104 -> 102, delta = 0

For reference, this is what I'm trying to implement but in the v4 code base:

fn fixed_to_utc<L>(e: &Epoch, leaps: L) -> f64
    where L: LeapSecondProvider
{
    let l = e.duration_since_j1900_tai.to_seconds();
    for leap_second in leaps.rev() {
        let maybe_converted = l - leap_second.delta_at;
        if maybe_converted >= leap_second.timestamp_tai_s {
            return maybe_converted;
        }
    }
    l
}
ChristopherRabotin commented 1 month ago

I'm moving this work to a later date because I'm trying to release version 4.0 soon. Since this bug only affects conversions within one second of the leap second, I don't think it's critical for it to be fixed right away.