python / cpython

The Python programming language
https://www.python.org
Other
63.14k stars 30.23k forks source link

datetime.utcfromtimestamp rounds results incorrectly #67705

Closed fd319893-c29a-4319-934f-b31206c2403f closed 9 years ago

fd319893-c29a-4319-934f-b31206c2403f commented 9 years ago
BPO 23517
Nosy @tim-one, @mdickinson, @abalkin, @vstinner, @larryhastings, @bitdancer, @serhiy-storchaka
Files
  • round_half_even_py34.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = 'https://github.com/abalkin' closed_at = created_at = labels = ['type-bug', 'library'] title = 'datetime.utcfromtimestamp rounds results incorrectly' updated_at = user = 'https://bugs.python.org/tbarbugli' ``` bugs.python.org fields: ```python activity = actor = 'vstinner' assignee = 'belopolsky' closed = True closed_date = closer = 'vstinner' components = ['Library (Lib)'] creation = creator = 'tbarbugli' dependencies = [] files = ['40414'] hgrepos = [] issue_num = 23517 keywords = ['patch', '3.3regression'] message_count = 100.0 messages = ['236552', '236553', '236577', '236580', '236581', '236585', '236587', '236589', '236597', '236607', '236608', '236609', '236610', '246049', '246073', '246097', '246104', '246121', '246284', '246306', '246307', '246334', '248942', '249282', '249300', '249301', '249303', '249304', '249307', '249308', '249309', '249519', '249520', '249521', '249525', '249528', '249530', '249532', '249539', '249569', '249611', '249728', '249744', '249771', '249777', '249778', '249779', '249786', '249787', '249788', '249791', '249796', '249798', '249825', '249827', '249834', '249835', '249836', '249841', '249842', '249844', '249845', '249847', '249848', '249849', '249850', '249851', '249853', '249856', '249875', '249879', '249885', '249886', '249898', '249909', '249910', '249925', '249926', '250043', '250047', '250048', '250049', '250066', '250075', '250083', '250101', '250103', '250104', '250107', '250108', '250113', '250259', '250265', '250268', '250269', '250270', '250304', '250405', '250973', '250974'] nosy_count = 13.0 nosy_names = ['tim.peters', 'mark.dickinson', 'belopolsky', 'vstinner', 'larry', 'r.david.murray', 'aconrad', 'BreamoreBoy', 'vivanov', 'python-dev', 'serhiy.storchaka', 'tbarbugli', 'trcarden'] pr_nums = [] priority = 'normal' resolution = 'fixed' stage = 'commit review' status = 'closed' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue23517' versions = ['Python 3.4', 'Python 3.5', 'Python 3.6'] ```

    abalkin commented 9 years ago

    It looks like this patch violates fromtimestamp(s) == EPOCH + timedelta(seconds=s) invariant:

    Python 3.6.0a0 (default:73911e6c97c8, Sep  4 2015, 13:14:12)
    >>> E = datetime(1970,1,1,tzinfo=timezone.utc)
    >>> s = -1/2**7
    >>> datetime.fromtimestamp(s, timezone.utc) == E + timedelta(seconds=s)
    False
    tim-one commented 9 years ago

    FYI, that invariant failed for me just now under the released 3.4.3 too:

    Python 3.4.3 (v3.4.3:9b73f1c3e601, Feb 24 2015, 22:44:40) [MSC v.1600 64 bit (AMD64)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from datetime import *
    >>> E = datetime(1970,1,1,tzinfo=timezone.utc)
    >>> s = -1/2**7
    >>> datetime.fromtimestamp(s, timezone.utc)
    datetime.datetime(1969, 12, 31, 23, 59, 59, 992187, tzinfo=datetime.timezone.utc)
    >>> E + timedelta(seconds=s)
    datetime.datetime(1969, 12, 31, 23, 59, 59, 992188, tzinfo=datetime.timezone.utc)

    It's an exactly-tied rounding case for the exactly-representable-in-binary s = -0.0078125.

    abalkin commented 9 years ago

    Can someone check 3.3? I believe that was the release where we tried to get various rounding issues right.

    83d2e70e-e599-4a04-b820-3814bbdb9bef commented 9 years ago
    Python 3.3.5 (v3.3.5:62cf4e77f785, Mar  9 2014, 10:35:05) [MSC v.1600 64 bit (AMD64)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from datetime import *
    >>> E = datetime(1970,1,1,tzinfo=timezone.utc)
    >>> s = -1/2**7
    >>> datetime.fromtimestamp(s, timezone.utc)
    datetime.datetime(1969, 12, 31, 23, 59, 59, 992187, tzinfo=datetime.timezone.utc)
    >>> E + timedelta(seconds=s)
    datetime.datetime(1969, 12, 31, 23, 59, 59, 992187, tzinfo=datetime.timezone.utc)

    FWIW Windows 10.

    vstinner commented 9 years ago

    Oh, it looks like my implementation of ROUND_HALF_UP is wrong. Negative numbers are not correctly rounded :-/ I'm working on a fix.

    abalkin commented 9 years ago

    Victor,

    I did not quite understand why you've chosen ROUND_HALF_UP over ROUND_HALF_EVEN, but as long as fromtimestamp() uses the same rounding as timedelta() - I am happy.

    tim-one commented 9 years ago

    Alex, if you like, I'll take the blame for the rounding method - I did express a preference for it here:

    http://bugs.python.org/issue23517#msg249309

    When I looked at the code earlier, the round-half-up implementation looked good to me (floor(x+0.5) if x >= 0 else ceil(x-0.5)).

    vstinner commented 9 years ago

    I did not quite understand why you've chosen ROUND_HALF_UP over ROUND_HALF_EVEN, but as long as fromtimestamp() uses the same rounding as timedelta() - I am happy.

    Not only Tim prefers this rounding mode, Python 2.7 also uses the same mode, and the original poster basically said that Python 3 doesn't behave like Python 2. So... the most obvious rounding mode is the same than Python 2, ROUND_HALF_UP.

    1762cc99-3127-4a62-9baf-30c3d0f51ef7 commented 9 years ago

    New changeset 3c29d05c0710 by Victor Stinner in branch 'default': Issue bpo-23517: Fix implementation of the ROUND_HALF_UP rounding mode in https://hg.python.org/cpython/rev/3c29d05c0710

    abalkin commented 9 years ago

    It would be nice to hear from Mark Dickinson on this. In Python 3, we took a much more systematic approach to rounding than a rather haphazard Python 2. For example, the rounding mode for timedelta(0, float_seconds) is not specified in Python 2, but it is in Python 3:

    https://docs.python.org/2/library/datetime.html#datetime.timedelta https://docs.python.org/3/library/datetime.html#datetime.timedelta

    vstinner commented 9 years ago

    Is the "review" link up-to date?

    Oh, I wrote a patch serie, but Rietveld doesn't know the dependencies between my patches...

    So I attached combined_py34.patch which combines the 3 patches into a single one, hard to reviewer, but it should work with Rietveld.

    tim-one commented 9 years ago

    Yes, it would be good to hear from Mark. When I first saw this report, I checked to see whether he was on the nosy list. He is, but is apparently busy elsewhere.

    My opinions haven't changed: nearest/even is unnatural for rounding times ("sometimes up, sometimes down" grates against the nature of monotonically non-decreasing timestamps), but it doesn't make a lick of real difference. If the inconsequential choice was documented, then sure, that it _was_ documented makes it consequential, so that's the one that must be used.

    vstinner commented 9 years ago

    My opinions haven't changed: nearest/even is unnatural for rounding times ("sometimes up, sometimes down" grates against the nature of monotonically non-decreasing timestamps),

    By the way, why does Python use ROUND_HALF_EVEN for round()? It also feels unnatural to me!

    >>> round(0.5)
    0
    >>> round(1.5)
    2
    tim-one commented 9 years ago

    Victor, there are good "theoretical" reasons for using half/even rounding in _general use. But this bug report isn't the place to get into them. Here it just should be enough to note that the IEEE 754 floating point standard _requires half/even to be the default rounding mode. Slowly but surely, all the world's non-business computer applications are moving toward that. Python is just one of 'em.

    abalkin commented 9 years ago

    By the way, why does Python use ROUND_HALF_EVEN for round()?

    ROUND_HALF_EVEN does not introduce statistical bias in your floating point data. With this choice a randomly chosen decimal has an equal chance of being rounded up or down. I think one can prove the same for a binary float being converted to a decimal with a fixed number of places after dot. The builtin round() is a unique beast and I would have to ask Mark to know what its statistical properties are, but in general round-half-to-even is justified by the desire of not introducing a statistical bias and I don't think the natural asymmetry of the time line matters in this argument.

    abalkin commented 9 years ago

    Here is what Wikipedia has to say on the subject:

    """ Round half to even .. This method treats positive and negative values symmetrically, and is therefore free of sign bias. More importantly, for reasonable distributions of y values, the expected (average) value of the rounded numbers is the same as that of the original numbers. However, this rule will introduce a towards-zero bias when y − 0.5 is even, and a towards-infinity bias for when it is odd. """ https://en.wikipedia.org/wiki/Rounding#Round_half_to_even

    abalkin commented 9 years ago

    .. and here is an unbeatable argument: "this variant of the round-to-nearest method is also called unbiased rounding, convergent rounding, statistician's rounding, **Dutch rounding**, Gaussian rounding, odd–even rounding, or bankers' rounding."

    abalkin commented 9 years ago

    .. and "[Round half to even] is the default rounding mode used in IEEE 754 computing functions and operators."

    vstinner commented 9 years ago

    Ok, thanks for the explanation :-)

    tim-one commented 9 years ago

    Goodness. It's the properties of "randomly chosen decimals" that have nothing to do with timestamps ;-) timestamps aren't random, so "statistical bias" is roughly meaningless in this context. I gave a specific example before of how nearest/even destroys obvious regularities in a _sequence_ of timestamps, where half-up preserves as much of the input regularity as possible. That's worth more than a million abstract "head arguments" on Wikipedia.

    But it doesn't make a lick of real difference either way. We're rounding to microseconds, and there are only 64 "fractional parts" where the methods could _possibly deliver different results: those of the form i/128 for i in range(1, 128, 2). All and only those are exactly representable in base 2, and require exactly 7 decimal digits "after the decimal point" to express in decimal, _and end with "5" in decimal. Half end with "25" while the other half with "75". So Alex's 1/128 is one of the only 32 possible fractional parts where it makes a difference. We systematically force all these cases to even, and dare think that's _not "biased"? Half-up would leave half the results even and half odd, exactly the same as the _input odd/even distribution of the 6th digit. And preserve the input strict alternation between even and odd in the 6th digit. nearest/even destroys all of that.

    Except that, I agree, there's no arguing with "Dutch rounding" ;-)

    abalkin commented 9 years ago

    timestamps aren't random, so "statistical bias" is roughly meaningless in this context.

    I agree. I don't think I made any arguments about timestamps specifically other than a consistency with timedeltas. In the later case, I think all the usual arguments for half-to-even tiebreaker apply.

    Let's just leave it at "When in doubt - use Dutch rounding."

    tim-one commented 9 years ago

    Bah. It doesn't matter who's consuming the rounding of a binary float to decimal microseconds: there are only 32 possible fractional parts where nearest/even and half-up deliver different results. half-up preserves properties of these specific inputs that nearest/even destroys. These inputs themselves have no bias - they're utterly uniformly spaced.

    Not only does nearest/even _introduce_ bias on these inputs by destroying these properties, it doesn't even preserve the spacing between them. Half-up leaves them all 5 microseconds apart, while nearest/even creates a bizarre "sometimes 4 microseconds apart, sometimes 6" output spacing out of thin air.

    So it's not a question of "when in doubt" to me, it's a question of "live up to what the docs already say". Although, again, it doesn't make a lick of real difference. That's why we'll never stop arguing about it ;-)

    tim-one commented 9 years ago

    Half-up leaves them all 5 microseconds apart,

    When only looking at the decimal digit in the 6th place after rounding. Which is all I did look at ;-)

    1762cc99-3127-4a62-9baf-30c3d0f51ef7 commented 9 years ago

    New changeset d755f75019c8 by Victor Stinner in branch 'default': Issue bpo-23517: Skip a datetime test on Windows https://hg.python.org/cpython/rev/d755f75019c8

    abalkin commented 9 years ago

    It doesn't matter who's consuming the rounding of a binary float to decimal microseconds

    That is true, but what does matter is who is producing the incoming floats. Consider an extreme case of a timer that ticks twice a microsecond and gives you results in a timespec struct style sec, nsec pair. If you convert those to timedeltas with timedelta(0, sec, nsec/1000), you will see half of nsec/1000 floats ending in .5 and half ending with .0. If you have a large sample of (sec, nsec) measurements, the mean of corresponding timedeltas will be 0.25µs larger with the half-up rounding than the actual mean of your samples.

    Is round-half-to-even a panacea? Of course not. In an extreme case if all your measurements are 1.5µs - it will not help at all, but in a more realistic sample you are likely to have an approximately equal number of even and odd µs and the Dutch rounding will affect the mean much less than round-half-up.

    As you said, for most uses none of this matters, but we are not discussing a change to timedelta(0, s) rounding here. All I want from this issue is to keep the promise that we make in the docs:

    On the POSIX compliant platforms, utcfromtimestamp(timestamp) is equivalent to the following expression:

    datetime(1970, 1, 1) + timedelta(seconds=timestamp)

    https://docs.python.org/3/library/datetime.html#datetime.datetime.utcfromtimestamp

    Tim, while we have this entertaining theoretical dispute, I am afraid we are leaving Victor confused about what he has to do in his patch. I think we agree that fromtimestamp() should use Dutch rounding. We only disagree why.

    If this is the case, please close this thread with a clear advise to use round-half-to-even and we can continue discussing the rationale privately or in a different forum.

    tim-one commented 9 years ago

    Victor, sorry if I muddied the waters here: Alex & I do agree nearest/even must be used. It's documented for timedelta already, and the seconds-from-the-epoch invariant Alex showed is at least as important to preserve as round-tripping.

    Alex, agreed about the overall mean in the example, but expect that continuing to exchange contrived examples isn't really a major life goal for either of us ;-) Thanks for playing along.

    vstinner commented 9 years ago

    Alex & I do agree nearest/even must be used.

    Ok, so Python 3.6 should be updated to use ROUND_HALF_EVEN, and then updated patches should be backported to Python 3.4 and 3.5. Right?

    Right now, Python 3.6 uses ROUND_HALF_UP.

    abalkin commented 9 years ago

    You may find it easier to start with a patch for 3.5 which is the only time sensitive task. I'll be happy to review your code in whatever form you find it easier to submit, but I believe in hg it is easier to forward port than to backport.

    larryhastings commented 9 years ago

    So is this considered broken enough that I need to accept a fix for 3.5.0? And has a consensus been reached about exactly what that fix would be?

    tim-one commented 9 years ago

    Universal consensus on ROUND_HALF_EVEN, yes.

    I would like to see it repaired for 3.5.0, but that's just me saying so without offering to do a single damned thing to make it happen ;-)

    larryhastings commented 9 years ago

    Well, I'm already holding up rc3 on one other issue, might as well fix this too. Can somebody make me a pull request?

    larryhastings commented 9 years ago

    I suspect we're not fixing this in 3.4, so I'm removing 3.4 from the version list.

    larryhastings commented 9 years ago

    Okay, this is literally the only thing rc3 is waiting on now.

    vstinner commented 9 years ago

    I understand that I have to implement a new rounding mode. The code will be new, I'm not condifent enough to push it immedialty into python 3.5. IMHO a buggy rounding mode is worse than keeping the current rounding mode. The rounding mode changed in python 3.3. There is no urgency to fix it.

    I will change python 3.6, then 3.4 and 3.5.1. Le 7 sept. 2015 07:33, "Larry Hastings" \report@bugs.python.org\ a écrit :

    Larry Hastings added the comment:

    Okay, this is literally the only thing rc3 is waiting on now.

    ----------


    Python tracker \report@bugs.python.org\ \http://bugs.python.org/issue23517\


    larryhastings commented 9 years ago

    Is it appropriate to make this change as a "bug fix", in already-released versions of Python? Would you be happy or sad if you updated your Python from 3.x.y to 3.x.y+1 and the rounding method used when converting floats to datetime stamps changed?

    larryhastings commented 9 years ago

    Well, for now I assume it really truly genuinely isn't going in 3.5.0. I suppose we can debate about 3.4.x and 3.5.1 later, once we have a fix everybody is happy with.

    abalkin commented 9 years ago

    Larry> Well, for now I assume it really truly genuinely isn't going in 3.5.0.

    This is an unfortunate outcome.

    Larry> I suppose we can debate about 3.4.x and 3.5.1 later

    It is even more unfortunate that the question of whether this regression is a bug or not is up for debate again.

    Victor> The code will be new, I'm not condifent enough to push it immedialty into python 3.5.

    I don't understand why any new code is needed to fix a regression. All you need to do is to revert a few chunks of 1e9cc1a03365 where the regression was introduced.

    larryhastings commented 9 years ago

    If the guy doing the work says "don't merge it in 3.5.0", and we're at the final release candidate before 3.5.0 final ships, and we don't even have a patch that everyone likes yet... it does seem like it's not going to happen for 3.5.0. Unfortunate perhaps but that's the situation we're in.

    vstinner commented 9 years ago

    All you need to do is to revert a few chunks of 1e9cc1a03365 where the regression was introduced.

    Hum, please don't revert this change. I spent a lot of days to write pytime.c/.h.

    My patches add more unit tests to datetime, to test the exact rounding mode.

    Well, for now I assume it really truly genuinely isn't going in 3.5.0. I suppose we can debate about 3.4.x and 3.5.1 later, once we have a fix everybody is happy with.

    The rounding mode of microseconds only impact a very few people. The rounding mode changed in 3.3, again in 3.4 and again in 3.5. This issue is the first bug report. So I think it's fine to modify 3.4.x and 3.5.x.

    abalkin commented 9 years ago

    Victor> please don't revert this change.

    I did not suggest reverting the entire commit. The change that affects fromdatetime() is just

    in datetime.py. It is probably more involved in _datetimemodule.c, but cannot be that bad. You can still keep pytime.c/.h.

    mdickinson commented 9 years ago

    FWIW, count me as +1 on roundTiesToEven, mostly just for consistency. It's easier to remember that pretty much everything in Python 3 does round-ties-to-even (round function, string formatting, float literal evaluations, int-to-float conversion, Fraction-to-float conversion, ...) than to have to remember that there's a handful of corner cases where we do roundTiesToAway instead.

    1762cc99-3127-4a62-9baf-30c3d0f51ef7 commented 9 years ago

    New changeset d0bb896f9b14 by Victor Stinner in branch 'default': Revert change 0eb8c182131e: https://hg.python.org/cpython/rev/d0bb896f9b14

    1762cc99-3127-4a62-9baf-30c3d0f51ef7 commented 9 years ago

    New changeset 171d5590ebc3 by Victor Stinner in branch 'default': Issue bpo-23517: fromtimestamp() and utcfromtimestamp() methods of https://hg.python.org/cpython/rev/171d5590ebc3

    larryhastings commented 9 years ago

    I wish people wouldn't remove old patches. There's no harm in leaving them, and it may be a useful historical record.

    vstinner commented 9 years ago

    I modified Python 3.6 to use ROUND_HALF_EVEN rounding mode in datetime.timedelta, datetime.datetime.fromtimestamp(), datetime.datetime.utcfromtimestamp().

    round_half_even_py34.patch: Backport this change to Python 3.4. I tried to write a minimal patch only changing datetime, not pytime.

    vstinner commented 9 years ago

    I wish people wouldn't remove old patches. There's no harm in leaving them, and it may be a useful historical record.

    It became hard to read this issue, it has a long history. My old patches for Python 3.4 were for ROUND_HALF_UP, but it was then decided to use ROUND_HALF_EVEN. Technically, patches are not "removed" but "unlinked" from the issue: you can get from the history at the bottom of this page.

    larryhastings commented 9 years ago

    Given Victor's reluctance to get this in to 3.5.0, this can't even be marked as a "deferred blocker" anymore. Demoting to normal priority.

    vstinner commented 9 years ago

    Alexander: can you please review attached round_half_even_py34.patch? It's the minimum patch to fix datetime/timedelta rounding for Python 3.4 (and 3.5).

    1762cc99-3127-4a62-9baf-30c3d0f51ef7 commented 9 years ago

    New changeset ee1cf1b188d2 by Victor Stinner in branch '3.4': Issue bpo-23517: Fix rounding in fromtimestamp() and utcfromtimestamp() methods https://hg.python.org/cpython/rev/ee1cf1b188d2

    vstinner commented 9 years ago

    Ok, I fixed Python 3.4 and 3.5 too: fromtimestamp() and utcfromtimestamp() now uses also ROUND_HALF_EVEN rounding mode, as timedelta constructor. I explained in Misc/NEWS that the change was made to respect the property:

    (datetime(1970,1,1) + timedelta(seconds=t)) == datetime.utcfromtimestamp(t)

    Thanks Alexander Belopolsky & Tim Peters for your explanation on rounding. It's not a simple problem :-)

    Thanks Tommaso Barbugli for the bug report: it's now fixed on all (maintained) Python 3 versions: 3.4, 3.5 and 3.6. The fix will be part of Python 3.5.1.