skyfielders / python-skyfield

Elegant astronomy for Python
MIT License
1.41k stars 212 forks source link

The `..._strftime' function rounds time inaccurately. #924

Closed Pearlfish-H closed 10 months ago

Pearlfish-H commented 10 months ago
>>> print(ts.utc(2023, 12, 31, 23, 59, 59.5).utc_strftime('%Y-%m-%d %H:%M:%S'))
2024-01-01 00:00:00
>>> print(ts.utc(2023, 12, 31, 23, 59, 30).utc_strftime('%Y-%m-%d %H:%M'))
2024-01-01 00:00

In this example, the moment in 2023 appears to be in 2024. According to ISO 8601-1:2019/Amd 1:2022 5.3.2:

The beginning of the day is defined as the first instant of the day, and the ending of the day is defined as the last instant of the day, where the last instant of the day is identical to the first instant of the next day.

NOTE 1    An instant does not have any duration.

For disambiguation and semantic needs, despite that the last instant of the day is technically also the first instant of the next day, separate expressions are used to differentiate these two cases.

As time of day relates to the duration elapsed since the first instant of the day:

  • the “beginning of the day” expressions represent that no time has elapsed since the first instant of the day;

  • the “ending of the day” expressions represent that 24 hours have elapsed since the first instant of the day.

NOTE 2    This document does not explicitly distinguish the representation of instants from time intervals.

Expressions of the beginning of the day and the ending of the day conform with representations of time of day.

The expressions, as complete representations in basic and extended format for the beginning of the day and the ending of the day, in accordance with 5.3.1, are as follows:

a) Beginning of the day

Basic format: T000000 (in the format of 5.3.1.2 a)) Extended format: 00:00:00 (or T00:00:00) (in the format of 5.3.1.2 b))

b) Ending of the day

Basic format: T240000 (in the format of 5.3.1.2 a)) Extended format: 24:00:00 (or T24:00:00) (in the format of 5.3.1.2 b))

It might be more unambiguous to express it as 2023-12-31 24:00:00.

brandon-rhodes commented 10 months ago

You might use the word ‘ambiguous’ differently than it’s used in the Skyfield project. For Skyfield, it would be considered, by definition, an ambiguity for the exact same time to be expressed in two different ways. It would mean that two different strings would actually represent the same time, and so the identity or non-identity of two times could not be determined by the simple visual inspection of their strings to see if they are character-for-character the same.

People who pay attention to timekeeping also tend to think "leap second" when they see an out-of-bounds number, like 60 for a minute or second. It would be a bit startling to see 24:00:00 because it looks like the beginning of a "leap hour", which doesn't exist, but would be expressed exactly that way if it did.

And, really, the goal of Skyfield is let people use Python to get results agreeing in every detail with high-precision data sources like the US Naval Observatory, and with the NASA HORIZONS system. When the USNO does rounding, it never returns 24:00 as a time. Instead, it always says 00:00 of the next day. (Please feel free to provide a counter-example if you find one, though!)

So let’s have Skyfield continue to observe the universal practice of astronomers of giving particular times (like 00:00) a single unambiguous name, instead of two different names depending on what other more accurate time was rounded.

Pearlfish-H commented 10 months ago

You might use the word ‘ambiguous’ differently than it’s used in the Skyfield project. For Skyfield, it would be considered, by definition, an ambiguity for the exact same time to be expressed in two different ways. It would mean that two different strings would actually represent the same time, and so the identity or non-identity of two times could not be determined by the simple visual inspection of their strings to see if they are character-for-character the same.

People who pay attention to timekeeping also tend to think "leap second" when they see an out-of-bounds number, like 60 for a minute or second. It would be a bit startling to see 24:00:00 because it looks like the beginning of a "leap hour", which doesn't exist, but would be expressed exactly that way if it did.

And, really, the goal of Skyfield is let people use Python to get results agreeing in every detail with high-precision data sources like the US Naval Observatory, and with the NASA HORIZONS system. When the USNO does rounding, it never returns 24:00 as a time. Instead, it always says 00:00 of the next day. (Please feel free to provide a counter-example if you find one, though!)

So let’s have Skyfield continue to observe the universal practice of astronomers of giving particular times (like 00:00) a single unambiguous name, instead of two different names depending on what other more accurate time was rounded.

Thank you for the explanation. You are right that a consistent expression for time is a more convenient approach. But in the example below, it seems to be expressed inconsistently:

>>> print(ts.utc(2023, 12, 31, 23, 59, 59.9).utc_strftime("On %B %d, %Y, at %H o'clock"))
On December 31, 2023, at 23 o'clock

I read the API reference and the behavior does match the description. But if you compare it with the example at the beginning, it still feels a bit strange. It might be more convenient to add a parameter or two to the ..._strftime function to control the rounding.

Pearlfish-H commented 10 months ago

I noticed this because I'm writing a program about the lunisolar calendar. In everyday life, there may be no difference between 23:59:59.9 and 00:00:00, but when it comes to programming a calendar, this can make a difference of one month! It's as if +0 and -0 are mathematically equivalent, but in meteorology, +0 °C means it won't freeze, while -0 °C means it will freeze, which is very different.

brandon-rhodes commented 9 months ago

But in the example below, it seems to be expressed inconsistently…

Skyfield only tries rounding for the specific cases of minutes and seconds. At the bottom of the utc_strftime() documentation it says:

        If the smallest time unit in your format is minutes or seconds,
        then the time is rounded to the nearest minute or second.
        Otherwise the value is truncated rather than rounded.

But I didn't carry that farther because I didn't know where to stop. Should %H be rounded to the nearest hour? Should a date string with only %Y be rounded to the nearest year? I didn't know for sure, so I was reluctant to do anything other than the small bumps required for %M minutes (30 second bump) and seconds %S (0.5 second bump), because I know that the USNO does both kinds of rounding in its own tables of things like sunrise and sunset. But I wasn't familiar with examples out in the wild of astronomy data sources rounding to the nearest hour, which it sounds like is a feature you would like for your program.

To get rounding to the nearest hour, try adding a half hour to your time:

    t = ts.utc(2023, 12, 31, 23, 59, 59.9)
    t = t + half_hour
    print(t.utc_strftime("On %B %d, %Y, at %H o'clock"))

Would it help if Skyfield's documentation explained somewhere how to do that kind of rounding yourself? Let me know if you looked somewhere in the documentation and didn't find it, and I could try adding it for future users who might want to round to the nearest hour — I can't remember if someone's tried that with Skyfield before.

Pearlfish-H commented 9 months ago

But I didn't carry that farther because I didn't know where to stop. Should %H be rounded to the nearest hour? Should a date string with only %Y be rounded to the nearest year? I didn't know for sure, so I was reluctant to do anything other than the small bumps required for %M minutes (30 second bump) and seconds %S (0.5 second bump), because I know that the USNO does both kinds of rounding in its own tables of things like sunrise and sunset. But I wasn't familiar with examples out in the wild of astronomy data sources rounding to the nearest hour, which it sounds like is a feature you would like for your program.

Thanks for your detailed explanation. Your explanation is enough to solve my problem.

By the way, after reading your explanation, I have a new idea. Maybe the function prototype can be extended like this:

utc_strftime(format='%Y-%m-%d %H:%M:%S UTC', rounding='default')

Where the value of the rounding parameter might be one of:

'default' - consistent with the current rounding method. I think that the current rounding method is practical enough to be the default rounding mode.

'nearest' - means to extend the 'default' mode to apply not only to seconds and minutes, but also to hours, days, months, and even years.

'floor' / 'ceil' - means rounding down / up, perhaps also accepting 'down' / 'up' as aliases. Both can be applied to years. This can be applied to some religious uses. For example, Buddhist monks are only allowed to eat before noon, and Muslims are only allowed to eat after sunset in Ramadan. This is useful for calculating a safe time to eat.

'nearest+' - similar to 'nearest', but can round to 24:00:00 (a rare requirement, I admit, but it is indeed ISO compliant).

It might be possible to accept None, meaning a result consistent with time.strftime(). It might also be possible to accept a function, to implement something like rounding to 5 seconds.

brandon-rhodes commented 9 months ago

I suspect the string formatting logic is already complicated and slow enough — I worry that I have already added too much. But I'll be happy to expand the documentation to explain how users can implement each of those behaviors themselves, at such time as people have applications for them.

I'm glad you are now getting the results you want!