python / cpython

The Python programming language
https://www.python.org
Other
63.13k stars 30.22k forks source link

mention explicitly that stdlib assumes gmtime(0) epoch is 1970 #66552

Closed 7fe5d93b-2a2c-46a0-b5cd-5602c591856a closed 2 years ago

7fe5d93b-2a2c-46a0-b5cd-5602c591856a commented 10 years ago
BPO 22356
Nosy @abalkin, @4kir4, @iritkatriel
Superseder
  • bpo-29026: time.time() documentation should mention UTC timezone
  • Files
  • docs-time-epoch_is_1970.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = created_at = labels = ['type-bug', 'docs'] title = 'mention explicitly that stdlib assumes gmtime(0) epoch is 1970' updated_at = user = 'https://github.com/4kir4' ``` bugs.python.org fields: ```python activity = actor = 'iritkatriel' assignee = 'docs@python' closed = True closed_date = closer = 'iritkatriel' components = ['Documentation'] creation = creator = 'akira' dependencies = [] files = ['36567'] hgrepos = [] issue_num = 22356 keywords = ['patch'] message_count = 10.0 messages = ['226539', '231954', '231955', '231957', '231964', '231968', '231969', '231971', '231979', '407205'] nosy_count = 5.0 nosy_names = ['belopolsky', 'cvrebert', 'docs@python', 'akira', 'iritkatriel'] pr_nums = [] priority = 'normal' resolution = 'duplicate' stage = 'resolved' status = 'closed' superseder = '29026' type = 'behavior' url = 'https://bugs.python.org/issue22356' versions = ['Python 2.7', 'Python 3.4', 'Python 3.5'] ```

    7fe5d93b-2a2c-46a0-b5cd-5602c591856a commented 10 years ago

    See discussion on Python-ideas https://mail.python.org/pipermail/python-ideas/2014-September/029228.html

    77411a08-770c-471e-ba30-9528530a8d45 commented 9 years ago

    Ping. This small patch has been waiting nearly 3 months for a review.

    abalkin commented 9 years ago

    I don't like the proposed note.

    1. It is not the job of the time module documentation to warn about "many functions in the stdlib." What are these functions, BTW?

    2. What is "calendar time in POSIX encoding"? This sounds like what time.asctime() returns.

    I think an improvement would be to spell Epoch with a capital E and define it as "The time zero hours, zero minutes, zero seconds, on January 1, 1970 Coordinated Universal Time (UTC)." See \http://pubs.opengroup.org/onlinepubs/9699919799\.

    7fe5d93b-2a2c-46a0-b5cd-5602c591856a commented 9 years ago

    Alexander Belopolsky added the comment:

    1. It is not the job of the time module documentation to warn about "many functions in the stdlib." What are these functions, BTW?

    The e-mail linked in the first message of this issue msg226539 enumerates some of the functions:

    https://mail.python.org/pipermail/python-ideas/2014-September/029228.html

    1. What is "calendar time in POSIX encoding"? This sounds like what time.asctime() returns.

    It is the language used by C standard for time() function:

    The time function determines the current calendar time. The encoding of the value is unspecified.

    I think an improvement would be to spell Epoch with a capital E and define it as "The time zero hours, zero minutes, zero seconds, on January 1, 1970 Coordinated Universal Time (UTC)." See \http://pubs.opengroup.org/onlinepubs/9699919799\.

    The word *epoch* (lowercase) is used by C standard.

    It is not enough to say that time module uses POSIX epoch (Epoch) e.g., a machine may use "right" zoneinfo (the same epoch year 1970) but the timestamp for the same UTC time are different by number of leap seconds (10+25 since 2012).

    POSIX encoding implies that the formula works:

      utc_time = datetime(1970, 1,  1) + timedelta(seconds=posix_timestamp)

    if time.time() doesn't return posix_timestamp than "many functions in the stdlib" will break.

    It is possible to inspect all stdlib functions that use time module and determine for some of them whether they will break if gmtime(0) is not 1970 or "right" zoneinfo is used or any non-POSIX time encoding is used. But it is hard to maintain such a list because any future code change may affect the behavior. I prefer a vague statement ("many functions") over a possible lie (the documentation shouldn't make promises that the implementation can't keep).

    POSIX language is (intentionally) vague and avoids SI seconds vs. UT1 (mean solar) seconds distinction. I don't consider systems where "seconds" doesn't mean SI seconds used by UTC time scale.

    abalkin commented 9 years ago

    In the context of Python library documentation, the word "encoding" strongly suggests that you are dealing with string/bytes. The situation may be different in C. If you want to refer to something that is defined by the POSIX standard you should use the words that can actually be found in that standard.

    When I search for "encoding" at \http://pubs.opengroup.org/onlinepubs/9699919799/\, I get

    crypt - string encoding function (CRYPT) encrypt - encoding function (CRYPT) setkey - set encoding key (CRYPT)

    and nothing related to time.

    7fe5d93b-2a2c-46a0-b5cd-5602c591856a commented 9 years ago

    Alexander Belopolsky added the comment:

    In the context of Python library documentation, the word "encoding" strongly suggests that you are dealing with string/bytes. The situation may be different in C. If you want to refer to something that is defined by the POSIX standard you should use the words that can actually be found in that standard.

    When I search for "encoding" at \http://pubs.opengroup.org/onlinepubs/9699919799/\, I get

    crypt - string encoding function (CRYPT) encrypt - encoding function (CRYPT) setkey - set encoding key (CRYPT)

    and nothing related to time.

    I've provide the direct quote from *C* standard in my previous message msg231957:

    > 2. What is "calendar time in POSIX encoding"? This sounds like what time.asctime() returns.

    It is the language used by C standard for time() function:

    The time function determines the current calendar time. The encoding
    of the value is unspecified.
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ <- from the C standard

    notice the word *encoding* in the quote.

    abalkin commented 9 years ago

    It is possible to inspect all stdlib functions that use time module and determine for some of them whether they will break if gmtime(0) is not 1970 or "right" zoneinfo is used or any non-POSIX time encoding is used. But it is hard to maintain such a list because any future code change may affect the behavior.

    Let's not confuse the issue of gmtime(0) not being 1970-01-01T00 and localtime() expecting non-POSIX time_t. Since gmtime(0) is the same on all platforms supported by Python, it is a fair game to rely on this fact in Python code.

    The issue of "right" zoneinfo is different: at least two major Python platforms (OS X and Linux) can be configured in a non-POSIX way. The decision not to support these configurations in the datetime module was deliberate, but some partial support can be added. For example, datetime.astimezone() cannot work correctly in the "right" timezone because datetime.second cannot be 60, but if it returns values that are off by some 20 seconds in other times, I would call it a bug, but many will disagree.

    I don't know how popular configurations with right timezones are, but testing Python stdlib in those configurations can only help the overall stdlib quality. (Unfortunately, at the moment we have have very few tests even for the mainstream timezones such as Europe/Moscow.)

    abalkin commented 9 years ago

    I've provide the direct quote from *C* standard ...

    I understand that C standard uses the word "encoding", but it does so for a reason that is completely unrelated to the choice of epoch. "Encoding" is how the bytes in memory should be interpreted as "number of seconds" or some other notion of time. For, example "two's complement little-endian 32-bit signed int" is an example of valid time_t encoding, another example would be IEEE 754 big-endian 64-bit double. Note that these choices are valid for both C and POSIX standards.

    If you google for your phrase "time in POSIX encoding", this issue is the only hit. This strongly suggests that your choice of words is not the most natural.

    7fe5d93b-2a2c-46a0-b5cd-5602c591856a commented 9 years ago

    Alexander Belopolsky added the comment:

    > I've provide the direct quote from *C* standard ...

    I understand that C standard uses the word "encoding", but it does so for a reason that is completely unrelated to the choice of epoch. "Encoding" is how the bytes in memory should be interpreted as "number of seconds" or some other notion of time. For, example "two's complement little-endian 32-bit signed int" is an example of valid time_t encoding, another example would be IEEE 754 big-endian 64-bit double. Note that these choices are valid for both C and POSIX standards.

    I agree one *part of "encoding" is how time_t is *represented in memory but it is not the only part e.g.:

    The mktime function converts the broken-down time, expressed as local time, in the structure pointed to by timeptr into a calendar time value with the same encoding as that of the values returned by the time function.

    notice: "the same encoding as ... returned by the time function".

    time() function can return values with different epoch (implementation defined). mktime() is specified to use the *same* encoding i.e., the same epoch, etc.

    i.e., [in simple words] we have calendar time (Gregorian date, time) and we can convert it to a number (e.g., Python integer), we can call that number "seconds" and we can represent that number as some (unspecified) bit-pattern in C.

    I consider the whole process of converting "time" to a bit-pattern in memory as "encoding" i.e., "32/64, un/signed int/754 double" is just *part* of it e.g.,

    1. specify that 1970-01-01T00:00:00Z is zero (0)
    2. specify 0 has time_t type
    3. specify how time_t type is represented in memory.

    I may be wrong that C standard includes the first item in time "encoding".

    If you google for your phrase "time in POSIX encoding", this issue is the only hit. This strongly suggests that your choice of words is not the most natural.

    I've googled the phrase (no surrounding quotes) and the links talk about time encoded as POSIX time [1] and some *literally contain the phrase *POSIX encoding [2] because *Python* documentation for calendar.timegm contains it [3]:

    [timegm] returns the corresponding Unix timestamp value, assuming an epoch of 1970, and the POSIX encoding. In fact, time.gmtime() and timegm() are each others’ inverse.

    In an effort to avoid personal influence, I've repeated the expreriment using Tor browser and other search engines -- the result is the same.

    timegm() documentation might be the reason why I've used the phrase.

    I agree "POSIX encoding" might be unclear. The patch could be replaced by any phrase that expresses that some functions in stdlib assume that time.time() returns (+/- fractional part) "seconds since the Epoch" as defined by POSIX [4].

    [1] http://en.wikipedia.org/wiki/Unix_time#Encoding_time_as_a_number [2] http://ruslanspivak.com/2011/07/20/how-to-convert-python-utc-datetime-object-to-unix-timestamp/ [3] https://docs.python.org/3/library/calendar.html#calendar.timegm [4] http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_15

    iritkatriel commented 2 years ago

    The docs now say

    "The epoch is the point where the time starts, and is platform dependent. For Unix, the epoch is January 1, 1970, 00:00:00 (UTC). To find out what the epoch is on a given platform, look at time.gmtime(0)."

    which I believe covers this issue.