python / cpython

The Python programming language
https://www.python.org/
Other
60.71k stars 29.31k forks source link

datetime needs an "epoch" method #46988

Closed daf46b87-a9e9-4381-bf23-8c373c9135e6 closed 13 years ago

daf46b87-a9e9-4381-bf23-8c373c9135e6 commented 16 years ago
BPO 2736
Nosy @malemburg, @tim-one, @jribbens, @amauryfa, @mdickinson, @abalkin, @pitrou, @catlee, @vstinner, @bitdancer
Files
  • add-datetime-totimestamp-method.diff: Implementation of datetime.datetime.timetuple and tests.
  • add-datetime-totimestamp-method-docs.diff
  • datetime_totimestamp-3.patch
  • issue2736-doc.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = 'https://github.com/abalkin' closed_at = created_at = labels = ['type-feature', 'docs'] title = 'datetime needs an "epoch" method' updated_at = user = 'https://github.com/tebeka' ``` bugs.python.org fields: ```python activity = actor = 'belopolsky' assignee = 'belopolsky' closed = True closed_date = closer = 'belopolsky' components = ['Documentation'] creation = creator = 'tebeka' dependencies = [] files = ['10251', '10256', '12329', '21565'] hgrepos = [] issue_num = 2736 keywords = ['patch'] message_count = 67.0 messages = ['66045', '66140', '66532', '66539', '66601', '66610', '75723', '75899', '75900', '75902', '75903', '75904', '75912', '76003', '76324', '76327', '76329', '76331', '76332', '76340', '76344', '76345', '76351', '76352', '77650', '77651', '99545', '103875', '106229', '106230', '106249', '106251', '106252', '106254', '106255', '124197', '124203', '124204', '124225', '124230', '124231', '124237', '124245', '124248', '124252', '124255', '124256', '124257', '124259', '132695', '132697', '132818', '132977', '132994', '133008', '133009', '133011', '133037', '133039', '133053', '133056', '133058', '133072', '133207', '133245', '134395', '162533'] nosy_count = 23.0 nosy_names = ['lemburg', 'tim.peters', 'ping', 'jribbens', 'guettli', 'amaury.forgeotdarc', 'mark.dickinson', 'davidfraser', 'belopolsky', 'pitrou', 'andersjm', 'catlee', 'vstinner', 'tomster', 'werneck', 'hodgestar', 'Neil Muller', 'erik.stephens', 'steve.roberts', 'r.david.murray', 'vivanov', 'python-dev', 'Jay.Taylor'] pr_nums = [] priority = 'normal' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue2736' versions = ['Python 3.3'] ```

    abalkin commented 13 years ago

    On Thu, Mar 31, 2011 at 2:52 PM, Ka-Ping Yee \report@bugs.python.org\ wrote: ..

    I am extremely disappointed by what has happened here.

    What exactly are you disappointed about? As far as I can tell, the feature request has not been rejected, just no one has come up with a satisfactory solution. The issue is open and patches are welcome.

    We are talking about a very simple method that everybody needs, and that has been reimplemented over and over again.  I have been frustrated countless times by the lack of a utctotimestamp() method.

    This is not what this issue has been about so far. It was about local time to timestamp. In py3k, utctotimestamp() is easy:

    EPOCH = datetime(1970, 1, 1)
    def utctotimestamp(dt) :
          return (dt - EPOCH).total_seconds()

     I have watched beginners and experienced programmers alike suffer over and over again for the lack of this method, and spend hours trying to figure out why Python doesn't have it and how it should be spelled in Python.

    These "beginners and experienced programmers" may want to reconsider using floating point numbers to store high precision timestamps. I know that some OSes made the same unfortunate choice in system APIs, but it does not make this choice any better. I can make a long list of why this is a bad choice, but I'll just mention that the precision of your timestamp varies from year to year and the program that works fine today may mysteriously fail in 5 years when nobody is around who can fix it anymore.

    The discussion here has been stuck on assumptions that the method must meet all of the following ideals:

     1. It must produce a value that is easy to compute with  2. It must have perfect precision in representing microseconds, forever  3. It must make an exact round-trip for any possible input  4. It must let users use whatever epoch they want

    No it was actually stuck because of the inability to reliably obtain the system UTC offset for historical times. This is a solvable problem, but the patches proposed so far did not solve it correctly. On top of this, there is an issue of datetime.fromtimestamp() not being invertible in the presence of DST shifts, so datetime.totimestamp() is ambiguous for some datetime values.

    These ideals cannot all be met simultaneously and perfectly.  The correct thing to do as an engineer is to choose a practical compromise and document the decision.

    The compromise that almost everyone chooses (because it is useful, convenient, has microsecond precision at least until the year 2100, and millisecond precision is frequently sufficient) is to use a floating-point number with an epoch of 1970-01-01.  Floating-point seconds can be easily subtracted, added, serialized, and deserialized, and are a primitive data type in nearly every language and database.

    Those who need to do arithmetics on time values more often deal with durations rather than points in time. An arbitrary epoch around current time is often more appropriate for timeseries analytics than Unix epoch.

     They are unmatched in ease of use.

    Compared to what? I find integers much more suitable for representing points in time than floats. Yes, in some languages you have to deal with 32-bit int overflow issues if you want to be able to deal with durations of over 100 years expressed in microseconds, but these days 64-bit integers are almost universally available.

     So everyone wastes time searching for the answer and figuring out how to write:

       import calendar    calendar.timegm(dt.utctimetuple()) + dt.microsecond * 1e-6

    And this is the wrong answer. Someone else using (dt - EPOCH).total_seconds() may get a slightly different result. Some may argue that given that it is not obvious what expression to use, we need to provide a function. However, we already provided timedelta.total_seconds() that hides the floating point details. In my opinion, even adding total_seconds() was a mistake and x / timedelta(seconds=1) is just as short and more explicit than x.total_seconds().

    I think the best we can do is to expand datetime.utcfromtimestamp() documentation to explain that it is equivalent to

    def utcfromtimestamp(s):
         return EPOCH + timedelta(seconds=s)

    and either leave it as an exercise to the reader to solve utcfromtimestamp(s) = dt for s or spell out

    def utctotimestamp(dt) :
          return (dt - EPOCH) / timedelta(seconds=1)
    9647ba2a-5717-4481-b336-914e78a93294 commented 13 years ago

    no one has come up with a satisfactory solution

    Plenty have proposed a satisfactory solution. No one has come up with a solution that is satisfactory to *you*, because you have overconstrained the problem. The reason we still have no utctotimestamp() after all these years is that you, and you alone as far as I know, refuse to accept a method that inverts utcfromtimestamp() with microsecond precision over its working range. Such a method is a perfectly reasonable and acceptable solution and would add a lot of value to Python as a language.

    I suspect you don't realize just how much pain you have unintentionally caused the world of Python users by singlehandedly blocking progress on this issue. I've seen them: students, friends, coworkers -- even very smart and capable people are stymied by it. No one thinks of looking in the calendar module. Maybe if you watched some of them struggle with this, you would understand.

    leave it as an exercise to the reader to solve

    To take this perspective is to miss the point of Python.

    fbcfe123-60ba-4865-8420-26b909d92373 commented 13 years ago

    I couldn't agree more with ping's position on this. It is against the spirit of what Python has set out to be, and the blocking needs to stop.

    Any chance we could get a .epoch() function into python 2.7 as well?

    abalkin commented 13 years ago

    On Mon, Apr 4, 2011 at 5:42 PM, Jay Taylor \report@bugs.python.org\ wrote: ..

    I couldn't agree more with ping's position on this.

    Adding votes to a tracker issue without a working patch will not move it any further. There are several committers besides me in the nosy list including the original author of the datetime module. If it was such a universally desired feature as Ka-Ping makes it sound, it would be committed long before I became the maintainer of the datetime module.

     It is against the spirit of what Python has set out to be, and the blocking needs to stop.

    I don't think any committer has a power to *block* a patch. I certainly don't. If Ka-Ping wants to add a feature over my objections, it is well within his power to do so. (Note that I objected to timedelta.total_seconds(), but it was added nevertheless.) It would be best, however to bring this to python-dev or python-ideas first.

    Any chance we could get a .epoch() function into python 2.7 as well?

    No.

    malemburg commented 13 years ago

    Just to add another data point to this discussion:

    mxDateTime, which in large parts inspired the Python datetime module, has had a .ticks() method (for local time) and a .gmticks() method (for UTC) for more than a decade now and so far, I haven't seen a single complaint about any of the issues raised in this discussion.

    The methods naturally return the Unix ticks value as float, since that's what the time module uses as basis and the whole purpose of those methods is to make interaction with the time module easy and straight-forward. Likewise, the epoch is also the same as the time module's one.

    Victor's patch could easily be updated to return floats as well, to make it compatible with the time module.

    There's only one catch that Victor's patch doesn't include: mktime() doesn't always work with DST set to anything but -1. mxDateTime checks the API at module load time and then determines whether it can be used with a DST setting or not (see the mxDateTime code for details). Not sure whether today's mktime() implementations still have any issues with this, but it's better to double-check than to get wrong results.

    http://www.egenix.com/products/python/mxBase/mxDateTime/
    vstinner commented 13 years ago

    Marc, could you maybe write a new patching taking care of the DST and maybe also the timezone? It looks like you have a long experience in timestamps :-)

    malemburg commented 13 years ago

    STINNER Victor wrote:

    STINNER Victor \victor.stinner@haypocalc.com\ added the comment:

    Marc, could you maybe write a new patching taking care of the DST and maybe also the timezone? It looks like you have a long experience in timestamps :-)

    Sorry, but no. I'm not really a fan of the datetime module and try to stay away from it whenever I can :-)

    Note that dealing with DST in timezones other than the local time zone, is bound to go wrong without direct access to the tz library. The C lib doesn't provide any good way to access timezone information other than the local timezone or UTC.

    When dealing with date/time values, it is usually best to stay with UTC and only transform those values into local times in user interfaces on the front-end client.

    Consequently, I'd suggest to only allow UTC and local timezone conversions for the method in the datetime module.

    abalkin commented 13 years ago

    On Tue, Apr 5, 2011 at 4:33 AM, Marc-Andre Lemburg \report@bugs.python.org\ wrote: ..

    mxDateTime, which in large parts inspired the Python datetime module, has had a .ticks() method (for local time) and a .gmticks() method (for UTC) for more than a decade now

    Yes, mxDateTime's gmticks()/ticks() pair of functions present a much more mature design than anything that has been proposed here. It is telling, however, that no one has mentioned mxDateTime's gmticks() on this issue in four years. On a duplicate bpo-1673409, Marc did bring it up, but as far as I can tell, no one responded. See msg75411.

    Google code search,

    http://www.google.com/codesearch?hl=en&sa=N&q=gmticks+lang:python

    returns only 13 hits for "gmticks". In several instances, the resulting float is immediately converted to int, in other instances "gmticks" is mentioned in comments and the code works around its bugs.

    I would not use Google Code search as an ultimate arbiter on the popularity of a feature, so I would really like to hear from the proponents about real life uses of gmticks() or any other examples where a similar method "has been reimplemented over and over again."

    so far, I haven't seen a single complaint about any of the issues raised in this discussion.

    Well, search for gmticks does not return too many hits outside of mxDateTime code and manuals, but I had no trouble finding this:

    """ okay, all the MySQLdb dataobject tick-based time handling methods are broken in various ways -- reconstruct GMT ticks from time module's mktime... """ http://viewvc.tigris.org/ds/viewMessage.do?dsForumId=4251&dsMessageId=656863

    Follow the link for some more colorful language describing developer's experience with the feature.

    Note that it is likely that the bug MySQLdb developer complained about was fixed in mxDateTime at some point, \http://www.egenix.com/www2002/python/mxDateTime-History.html\, but this shows that implementing gmticks() correctly is not as trivial as those who never tried might think.

    The methods naturally return the Unix ticks value as float, since that's what the time module uses as basis

    Which in turn is a mistake IMO. Note that POSIX does not use float timestamps for a reason.

    and the whole purpose of those methods is to make interaction with the time module easy and straight-forward.

    This is not the goal that I would support. I would rather see code that uses datetime module not require time module methods at all.

    Victor's patch could easily be updated to return floats as well, to make it compatible with the time module.

    Victor reported implementing two methods, one to return a float and another to return a tuple. See msg124259. I am not sure I've seen that code.

    There's only one catch that Victor's patch doesn't include ...

    No, it is not about "only one catch". Victor's patch is simply wrong. For an aware datetime instance it extracts DST flag from tzinfo, but ignores the offset.

    abalkin commented 13 years ago

    MAL> Since most of the datetime module was inspired by mxDateTime, MAL> I wonder why [ticks()/gmticks()] were left out. (msg75411)

    """ The datetime module intended to be an island of relative sanity. Because the range of dates "timestamps" can represent varies across platforms (and even "the epoch" varies), datetime doesn't even try to produce timestamps directly -- datetime is more of an alternative to "seconds from the epoch" schemes. Because datetime objects have greater range and precision than timestamps, conversion is problem-free in only one direction. It's not a coincidence that that's the only direction datetime supplies ;-) """ - Tim Peters

    http://bytes.com/topic/python/answers/522572-datetime-timestamp

    I will also add that fromtimestamp() is not invertible in the presence of DST. That's why mxDatetime.ticks() takes a DST flag making it effectively a multi-valued function. Note that naive users, who just want to pass datetime values to an OS function expecting a float, most likely will not have means of properly obtaining DST flag.

    malemburg commented 13 years ago

    Alexander Belopolsky wrote:

    Alexander Belopolsky \belopolsky@users.sourceforge.net\ added the comment:

    On Tue, Apr 5, 2011 at 4:33 AM, Marc-Andre Lemburg \report@bugs.python.org\ wrote: .. > mxDateTime, which in large parts inspired the Python datetime module, > has had a .ticks() method (for local time) and a .gmticks() method > (for UTC) for more than a decade now

    Yes, mxDateTime's gmticks()/ticks() pair of functions present a much more mature design than anything that has been proposed here. It is telling, however, that no one has mentioned mxDateTime's gmticks() on this issue in four years. On a duplicate bpo-1673409, Marc did bring it up, but as far as I can tell, no one responded. See msg75411.

    Google code search,

    http://www.google.com/codesearch?hl=en&sa=N&q=gmticks+lang:python

    returns only 13 hits for "gmticks". In several instances, the resulting float is immediately converted to int, in other instances "gmticks" is mentioned in comments and the code works around its bugs.

    I would not use Google Code search as an ultimate arbiter on the popularity of a feature, so I would really like to hear from the proponents about real life uses of gmticks() or any other examples where a similar method "has been reimplemented over and over again."

    mxDateTime needs those two methods, since it doesn't natively use timezones. The .ticks() method is used for local time values, .gmticks() for UTC ones; that's why there are two methods.

    The .gmticks() method is always used when storing UTC values in mxDateTime instances, which actually is the preferred way of storing data in databases. Google Code doesn't really count much, since it only scans a limited number of OSS code bases. Most of our users are commercial users who use the tools in-house.

    Note that implementing .gmticks() is fairly easy on POSIX conform systems. On most others, timegm() can be used. If that doesn't exist, things get tricky, but that case should be rare nowadays.

    > so far, I haven't seen a single complaint about any of the issues raised in this discussion.

    Well, search for gmticks does not return too many hits outside of mxDateTime code and manuals, but I had no trouble finding this:

    """ okay, all the MySQLdb dataobject tick-based time handling methods are broken in various ways -- reconstruct GMT ticks from time module's mktime... """ http://viewvc.tigris.org/ds/viewMessage.do?dsForumId=4251&dsMessageId=656863

    Follow the link for some more colorful language describing developer's experience with the feature.

    Note that it is likely that the bug MySQLdb developer complained about was fixed in mxDateTime at some point, \http://www.egenix.com/www2002/python/mxDateTime-History.html\, but this shows that implementing gmticks() correctly is not as trivial as those who never tried might think.

    Note that he was referring to the .ticks() method, not the .gmticks() method. The patch doesn't say which version of mxDateTime he was using. The bug mentioned in the changelog was fixed in 1998. It is possible, however, that the mktime() on his system was broken - which is why I added a test for it in mxDateTime.

    > > The methods naturally return the Unix ticks value as float, > since that's what the time module uses as basis

    Which in turn is a mistake IMO. Note that POSIX does not use float timestamps for a reason.

    The time module is our reference in this case and this tries hard to add fractions of a second to the value :-)

    Note that sub-second accuracy relies on a number of factors, the storage format most certainly is the least important aspect ;-)

    On many systems, you only get 1/100s accuracy, on others, the timer ticks in fixed increments, giving you even weirder sub-second values (e.g. time appears to stay constant between time.time() calls).

    OTOH, there's a new set of APIs for nano-second accuracy available now, which the datetime module objects cannot represent at all due to the integer-based storage format.

    BTW: The integer format was chose in order to keep the memory footprint of the objects low.

    > and the whole purpose > of those methods is to make interaction with the time module easy > and straight-forward.

    This is not the goal that I would support. I would rather see code that uses datetime module not require time module methods at all.

    No chance :-) In practice, the time module gets used a lot for date/time storage or to quickly measure time deltas. Some people also prefer time module ticks due to their lower memory footprint, esp. when it comes to storing thousands of time values in time series.

    > Victor's patch could easily be updated to return floats as well, > to make it compatible with the time module. >

    Victor reported implementing two methods, one to return a float and another to return a tuple. See msg124259. I am not sure I've seen that code.

    I had a look at the last patch on this ticket.

    > There's only one catch that Victor's patch doesn't include ...

    No, it is not about "only one catch". Victor's patch is simply wrong. For an aware datetime instance it extracts DST flag from tzinfo, but ignores the offset.

    True, so make that two issues ;-)

    malemburg commented 13 years ago

    Alexander Belopolsky wrote:

    Alexander Belopolsky \belopolsky@users.sourceforge.net\ added the comment:

    MAL> Since most of the datetime module was inspired by mxDateTime, MAL> I wonder why [ticks()/gmticks()] were left out. (msg75411)

    """ The datetime module intended to be an island of relative sanity. Because the range of dates "timestamps" can represent varies across platforms (and even "the epoch" varies), datetime doesn't even try to produce timestamps directly -- datetime is more of an alternative to "seconds from the epoch" schemes. Because datetime objects have greater range and precision than timestamps, conversion is problem-free in only one direction. It's not a coincidence that that's the only direction datetime supplies ;-) """ - Tim Peters

    http://bytes.com/topic/python/answers/522572-datetime-timestamp

    I will also add that fromtimestamp() is not invertible in the presence of DST. That's why mxDatetime.ticks() takes a DST flag making it effectively a multi-valued function. Note that naive users, who just want to pass datetime values to an OS function expecting a float, most likely will not have means of properly obtaining DST flag.

    IMHO, the whole concept of DST is broken, but that's not our fault :-)

    Ditching the concept just because it is known to fail for one hour out of 8760 you have in a typical year doesn't really warrant breaking the "practicality beats purity" guideline.

    Otherwise, we'd have to ditch the date support in the datetime module too: after all, Feb 29 only exists every 4 years (well, most of the time) - and that's one day out of 1461 in those 4 years, so an even worse ratio :-)

    And I'm not even starting to talk about ditching the concept of Unix ticks to begin with, as a result of having leap seconds causing POSIX ticks values not matching (real) UTC ticks.

    In reality, all these things hardly ever matter and if they do, users will either know that they have to make conscious decision, simply don't care or decide not to care.

    BTW: A "timestamp" usually refers to the combination of date and time. The time.time() return value is "seconds since the Epoch". I usually call those values "ticks" (not sure whether it's standard term of not, but always writing "seconds since Epoch" wasn't an option either ;-)).

    Date/time is fun, isn't it ?

    abalkin commented 13 years ago

    Let me state my position on this issue once again. Converting datetime values to float is easy. If your dt is a naive instance representing UTC time:

       timestamp = (dt - datetime(1970, 1, 1)) / timedelta(seconds=1)

    If your dt is an aware instance:

       timestamp = (dt - datetime(1970, 1, 1, tzinfo=timezone.utc)) / timedelta(seconds=1)

    These recipes are easy to adjust for your application needs. One application may want millisecond or microsecond ticks, another might want to carry subsecond presision in a separate integer, third may want to avoid timestamps before 1970 or after 2038 or ignore microseconds altogether. No matter what a hypothetical datetime.epoch() will provide, most of applications will need to add a line or two to its code to serve their needs. Applications that will use dt.epoch() naively without thinking what dt represents (say local or UTC) will be buggy.

    The only related feature that I think is missing from datetime module is the ability to obtain local time as an aware datetime instance and to convert a naive datetime instance assumed to represent local time to an aware one.

    This is the subject of bpo-9527, but there is a resistance to adding that feature.

    abalkin commented 13 years ago

    On Tue, Apr 5, 2011 at 1:45 PM, Marc-Andre Lemburg \report@bugs.python.org\ wrote: ..

    BTW: A "timestamp" usually refers to the combination of date and time. The time.time() return value is "seconds since the Epoch". I usually call those values "ticks" (not sure whether it's standard term of not, but always writing "seconds since Epoch" wasn't an option either ;-)).

    In Unix context, the term "timestamp" is usually associated with the various time values that OS stores with the files. I think this use is due to the analogy with physical "received on" timestamps used on paper documents. Since it is well-known that Unix filesystems store time values as seconds since Epoch, it is common to refer to these values as "Unix timestamps".

    See, for example:

    http://pubs.opengroup.org/onlinepubs/9699919799/utilities/touch.html

    42a80271-bbe2-4d37-b097-2bfedf49ce53 commented 13 years ago

    On 04/05/2011 18:22, Alexander Belopolsky wrote:

    """ The datetime module intended to be an island of relative sanity. ....... """ - Tim Peters

    Refusing to cooperate with the rest of the world is not sane by my books.

    On 04/05/2011 21:06, Alexander Belopolsky wrote:

    Converting datetime values to float is easy. If your dt is a naive instance representing UTC time:

    timestamp = (dt - datetime(1970, 1, 1)) / timedelta(seconds=1)

    If your dt is an aware instance:

    timestamp = (dt - datetime(1970, 1, 1, tzinfo=timezone.utc)) / timedelta(seconds=1)

    Please add these lines to the datetime module's documentation. In some central, well lit place. I believe that if nothing else, the whole discussion should have proved to you that there are many people looking for them.

    OTOH a sinceepoch(epoch=datetime(1970,1,1)) method of the datetime class should be equally easy. Would be especially useful if few of the more frequently used EPOCHs are provided as constants.

    abalkin commented 13 years ago

    On Thu, Apr 7, 2011 at 6:20 AM, Velko Ivanov \report@bugs.python.org\ wrote: ..

    > Converting datetime values to float is easy.   If your dt is a naive instance representing UTC time: > >     timestamp = (dt - datetime(1970, 1, 1)) / timedelta(seconds=1) > > If your dt is an aware instance: > >     timestamp = (dt - datetime(1970, 1, 1, tzinfo=timezone.utc)) / timedelta(seconds=1)

    Please add these lines to the datetime module's documentation. In some central, well lit place. I believe that if nothing else, the whole discussion should have proved to you that there are many people looking for them.

    This is precisely what I suggested at the end of msg132697 above. See attached patch (bpo-2736-doc.diff) for a proposed documentation enhancement.

    1762cc99-3127-4a62-9baf-30c3d0f51ef7 commented 13 years ago

    New changeset b55eac85e39c by Alexander Belopolsky in branch 'default': Issue bpo-2736: Documented how to compute seconds since epoch. http://hg.python.org/cpython/rev/b55eac85e39c

    1762cc99-3127-4a62-9baf-30c3d0f51ef7 commented 12 years ago

    New changeset 6671c5039e15 by Alexander Belopolsky in branch 'default': Issue bpo-2736: Added datetime.timestamp() method. http://hg.python.org/cpython/rev/6671c5039e15