Closed fd319893-c29a-4319-934f-b31206c2403f closed 9 years ago
It looks like this patch violates fromtimestamp(s) == EPOCH + timedelta(seconds=s) invariant:
Python 3.6.0a0 (default:73911e6c97c8, Sep 4 2015, 13:14:12)
>>> E = datetime(1970,1,1,tzinfo=timezone.utc)
>>> s = -1/2**7
>>> datetime.fromtimestamp(s, timezone.utc) == E + timedelta(seconds=s)
False
FYI, that invariant failed for me just now under the released 3.4.3 too:
Python 3.4.3 (v3.4.3:9b73f1c3e601, Feb 24 2015, 22:44:40) [MSC v.1600 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from datetime import *
>>> E = datetime(1970,1,1,tzinfo=timezone.utc)
>>> s = -1/2**7
>>> datetime.fromtimestamp(s, timezone.utc)
datetime.datetime(1969, 12, 31, 23, 59, 59, 992187, tzinfo=datetime.timezone.utc)
>>> E + timedelta(seconds=s)
datetime.datetime(1969, 12, 31, 23, 59, 59, 992188, tzinfo=datetime.timezone.utc)
It's an exactly-tied rounding case for the exactly-representable-in-binary s = -0.0078125.
Can someone check 3.3? I believe that was the release where we tried to get various rounding issues right.
Python 3.3.5 (v3.3.5:62cf4e77f785, Mar 9 2014, 10:35:05) [MSC v.1600 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from datetime import *
>>> E = datetime(1970,1,1,tzinfo=timezone.utc)
>>> s = -1/2**7
>>> datetime.fromtimestamp(s, timezone.utc)
datetime.datetime(1969, 12, 31, 23, 59, 59, 992187, tzinfo=datetime.timezone.utc)
>>> E + timedelta(seconds=s)
datetime.datetime(1969, 12, 31, 23, 59, 59, 992187, tzinfo=datetime.timezone.utc)
FWIW Windows 10.
Oh, it looks like my implementation of ROUND_HALF_UP is wrong. Negative numbers are not correctly rounded :-/ I'm working on a fix.
Victor,
I did not quite understand why you've chosen ROUND_HALF_UP over ROUND_HALF_EVEN, but as long as fromtimestamp() uses the same rounding as timedelta() - I am happy.
Alex, if you like, I'll take the blame for the rounding method - I did express a preference for it here:
http://bugs.python.org/issue23517#msg249309
When I looked at the code earlier, the round-half-up implementation looked good to me (floor(x+0.5) if x >= 0 else ceil(x-0.5)).
I did not quite understand why you've chosen ROUND_HALF_UP over ROUND_HALF_EVEN, but as long as fromtimestamp() uses the same rounding as timedelta() - I am happy.
Not only Tim prefers this rounding mode, Python 2.7 also uses the same mode, and the original poster basically said that Python 3 doesn't behave like Python 2. So... the most obvious rounding mode is the same than Python 2, ROUND_HALF_UP.
New changeset 3c29d05c0710 by Victor Stinner in branch 'default': Issue bpo-23517: Fix implementation of the ROUND_HALF_UP rounding mode in https://hg.python.org/cpython/rev/3c29d05c0710
It would be nice to hear from Mark Dickinson on this. In Python 3, we took a much more systematic approach to rounding than a rather haphazard Python 2. For example, the rounding mode for timedelta(0, float_seconds) is not specified in Python 2, but it is in Python 3:
https://docs.python.org/2/library/datetime.html#datetime.timedelta https://docs.python.org/3/library/datetime.html#datetime.timedelta
Is the "review" link up-to date?
Oh, I wrote a patch serie, but Rietveld doesn't know the dependencies between my patches...
So I attached combined_py34.patch which combines the 3 patches into a single one, hard to reviewer, but it should work with Rietveld.
Yes, it would be good to hear from Mark. When I first saw this report, I checked to see whether he was on the nosy list. He is, but is apparently busy elsewhere.
My opinions haven't changed: nearest/even is unnatural for rounding times ("sometimes up, sometimes down" grates against the nature of monotonically non-decreasing timestamps), but it doesn't make a lick of real difference. If the inconsequential choice was documented, then sure, that it _was_ documented makes it consequential, so that's the one that must be used.
My opinions haven't changed: nearest/even is unnatural for rounding times ("sometimes up, sometimes down" grates against the nature of monotonically non-decreasing timestamps),
By the way, why does Python use ROUND_HALF_EVEN for round()? It also feels unnatural to me!
>>> round(0.5)
0
>>> round(1.5)
2
Victor, there are good "theoretical" reasons for using half/even rounding in _general use. But this bug report isn't the place to get into them. Here it just should be enough to note that the IEEE 754 floating point standard _requires half/even to be the default rounding mode. Slowly but surely, all the world's non-business computer applications are moving toward that. Python is just one of 'em.
By the way, why does Python use ROUND_HALF_EVEN for round()?
ROUND_HALF_EVEN does not introduce statistical bias in your floating point data. With this choice a randomly chosen decimal has an equal chance of being rounded up or down. I think one can prove the same for a binary float being converted to a decimal with a fixed number of places after dot. The builtin round() is a unique beast and I would have to ask Mark to know what its statistical properties are, but in general round-half-to-even is justified by the desire of not introducing a statistical bias and I don't think the natural asymmetry of the time line matters in this argument.
Here is what Wikipedia has to say on the subject:
""" Round half to even .. This method treats positive and negative values symmetrically, and is therefore free of sign bias. More importantly, for reasonable distributions of y values, the expected (average) value of the rounded numbers is the same as that of the original numbers. However, this rule will introduce a towards-zero bias when y − 0.5 is even, and a towards-infinity bias for when it is odd. """ https://en.wikipedia.org/wiki/Rounding#Round_half_to_even
.. and here is an unbeatable argument: "this variant of the round-to-nearest method is also called unbiased rounding, convergent rounding, statistician's rounding, **Dutch rounding**, Gaussian rounding, odd–even rounding, or bankers' rounding."
.. and "[Round half to even] is the default rounding mode used in IEEE 754 computing functions and operators."
Ok, thanks for the explanation :-)
Goodness. It's the properties of "randomly chosen decimals" that have nothing to do with timestamps ;-) timestamps aren't random, so "statistical bias" is roughly meaningless in this context. I gave a specific example before of how nearest/even destroys obvious regularities in a _sequence_ of timestamps, where half-up preserves as much of the input regularity as possible. That's worth more than a million abstract "head arguments" on Wikipedia.
But it doesn't make a lick of real difference either way. We're rounding to microseconds, and there are only 64 "fractional parts" where the methods could _possibly deliver different results: those of the form i/128 for i in range(1, 128, 2). All and only those are exactly representable in base 2, and require exactly 7 decimal digits "after the decimal point" to express in decimal, _and end with "5" in decimal. Half end with "25" while the other half with "75". So Alex's 1/128 is one of the only 32 possible fractional parts where it makes a difference. We systematically force all these cases to even, and dare think that's _not "biased"? Half-up would leave half the results even and half odd, exactly the same as the _input odd/even distribution of the 6th digit. And preserve the input strict alternation between even and odd in the 6th digit. nearest/even destroys all of that.
Except that, I agree, there's no arguing with "Dutch rounding" ;-)
timestamps aren't random, so "statistical bias" is roughly meaningless in this context.
I agree. I don't think I made any arguments about timestamps specifically other than a consistency with timedeltas. In the later case, I think all the usual arguments for half-to-even tiebreaker apply.
Let's just leave it at "When in doubt - use Dutch rounding."
Bah. It doesn't matter who's consuming the rounding of a binary float to decimal microseconds: there are only 32 possible fractional parts where nearest/even and half-up deliver different results. half-up preserves properties of these specific inputs that nearest/even destroys. These inputs themselves have no bias - they're utterly uniformly spaced.
Not only does nearest/even _introduce_ bias on these inputs by destroying these properties, it doesn't even preserve the spacing between them. Half-up leaves them all 5 microseconds apart, while nearest/even creates a bizarre "sometimes 4 microseconds apart, sometimes 6" output spacing out of thin air.
So it's not a question of "when in doubt" to me, it's a question of "live up to what the docs already say". Although, again, it doesn't make a lick of real difference. That's why we'll never stop arguing about it ;-)
Half-up leaves them all 5 microseconds apart,
When only looking at the decimal digit in the 6th place after rounding. Which is all I did look at ;-)
New changeset d755f75019c8 by Victor Stinner in branch 'default': Issue bpo-23517: Skip a datetime test on Windows https://hg.python.org/cpython/rev/d755f75019c8
It doesn't matter who's consuming the rounding of a binary float to decimal microseconds
That is true, but what does matter is who is producing the incoming floats. Consider an extreme case of a timer that ticks twice a microsecond and gives you results in a timespec struct style sec, nsec pair. If you convert those to timedeltas with timedelta(0, sec, nsec/1000), you will see half of nsec/1000 floats ending in .5 and half ending with .0. If you have a large sample of (sec, nsec) measurements, the mean of corresponding timedeltas will be 0.25µs larger with the half-up rounding than the actual mean of your samples.
Is round-half-to-even a panacea? Of course not. In an extreme case if all your measurements are 1.5µs - it will not help at all, but in a more realistic sample you are likely to have an approximately equal number of even and odd µs and the Dutch rounding will affect the mean much less than round-half-up.
As you said, for most uses none of this matters, but we are not discussing a change to timedelta(0, s) rounding here. All I want from this issue is to keep the promise that we make in the docs:
On the POSIX compliant platforms, utcfromtimestamp(timestamp) is equivalent to the following expression:
datetime(1970, 1, 1) + timedelta(seconds=timestamp)
https://docs.python.org/3/library/datetime.html#datetime.datetime.utcfromtimestamp
Tim, while we have this entertaining theoretical dispute, I am afraid we are leaving Victor confused about what he has to do in his patch. I think we agree that fromtimestamp() should use Dutch rounding. We only disagree why.
If this is the case, please close this thread with a clear advise to use round-half-to-even and we can continue discussing the rationale privately or in a different forum.
Victor, sorry if I muddied the waters here: Alex & I do agree nearest/even must be used. It's documented for timedelta already, and the seconds-from-the-epoch invariant Alex showed is at least as important to preserve as round-tripping.
Alex, agreed about the overall mean in the example, but expect that continuing to exchange contrived examples isn't really a major life goal for either of us ;-) Thanks for playing along.
Alex & I do agree nearest/even must be used.
Ok, so Python 3.6 should be updated to use ROUND_HALF_EVEN, and then updated patches should be backported to Python 3.4 and 3.5. Right?
Right now, Python 3.6 uses ROUND_HALF_UP.
You may find it easier to start with a patch for 3.5 which is the only time sensitive task. I'll be happy to review your code in whatever form you find it easier to submit, but I believe in hg it is easier to forward port than to backport.
So is this considered broken enough that I need to accept a fix for 3.5.0? And has a consensus been reached about exactly what that fix would be?
Universal consensus on ROUND_HALF_EVEN, yes.
I would like to see it repaired for 3.5.0, but that's just me saying so without offering to do a single damned thing to make it happen ;-)
Well, I'm already holding up rc3 on one other issue, might as well fix this too. Can somebody make me a pull request?
I suspect we're not fixing this in 3.4, so I'm removing 3.4 from the version list.
Okay, this is literally the only thing rc3 is waiting on now.
I understand that I have to implement a new rounding mode. The code will be new, I'm not condifent enough to push it immedialty into python 3.5. IMHO a buggy rounding mode is worse than keeping the current rounding mode. The rounding mode changed in python 3.3. There is no urgency to fix it.
I will change python 3.6, then 3.4 and 3.5.1. Le 7 sept. 2015 07:33, "Larry Hastings" \report@bugs.python.org\ a écrit :
Larry Hastings added the comment:
Okay, this is literally the only thing rc3 is waiting on now.
----------
Python tracker \report@bugs.python.org\ \http://bugs.python.org/issue23517\
Is it appropriate to make this change as a "bug fix", in already-released versions of Python? Would you be happy or sad if you updated your Python from 3.x.y to 3.x.y+1 and the rounding method used when converting floats to datetime stamps changed?
Well, for now I assume it really truly genuinely isn't going in 3.5.0. I suppose we can debate about 3.4.x and 3.5.1 later, once we have a fix everybody is happy with.
Larry> Well, for now I assume it really truly genuinely isn't going in 3.5.0.
This is an unfortunate outcome.
Larry> I suppose we can debate about 3.4.x and 3.5.1 later
It is even more unfortunate that the question of whether this regression is a bug or not is up for debate again.
Victor> The code will be new, I'm not condifent enough to push it immedialty into python 3.5.
I don't understand why any new code is needed to fix a regression. All you need to do is to revert a few chunks of 1e9cc1a03365 where the regression was introduced.
If the guy doing the work says "don't merge it in 3.5.0", and we're at the final release candidate before 3.5.0 final ships, and we don't even have a patch that everyone likes yet... it does seem like it's not going to happen for 3.5.0. Unfortunate perhaps but that's the situation we're in.
All you need to do is to revert a few chunks of 1e9cc1a03365 where the regression was introduced.
Hum, please don't revert this change. I spent a lot of days to write pytime.c/.h.
My patches add more unit tests to datetime, to test the exact rounding mode.
Well, for now I assume it really truly genuinely isn't going in 3.5.0. I suppose we can debate about 3.4.x and 3.5.1 later, once we have a fix everybody is happy with.
The rounding mode of microseconds only impact a very few people. The rounding mode changed in 3.3, again in 3.4 and again in 3.5. This issue is the first bug report. So I think it's fine to modify 3.4.x and 3.5.x.
Victor> please don't revert this change.
I did not suggest reverting the entire commit. The change that affects fromdatetime() is just
in datetime.py. It is probably more involved in _datetimemodule.c, but cannot be that bad. You can still keep pytime.c/.h.
FWIW, count me as +1 on roundTiesToEven, mostly just for consistency. It's easier to remember that pretty much everything in Python 3 does round-ties-to-even (round function, string formatting, float literal evaluations, int-to-float conversion, Fraction-to-float conversion, ...) than to have to remember that there's a handful of corner cases where we do roundTiesToAway instead.
New changeset d0bb896f9b14 by Victor Stinner in branch 'default': Revert change 0eb8c182131e: https://hg.python.org/cpython/rev/d0bb896f9b14
New changeset 171d5590ebc3 by Victor Stinner in branch 'default': Issue bpo-23517: fromtimestamp() and utcfromtimestamp() methods of https://hg.python.org/cpython/rev/171d5590ebc3
I wish people wouldn't remove old patches. There's no harm in leaving them, and it may be a useful historical record.
I modified Python 3.6 to use ROUND_HALF_EVEN rounding mode in datetime.timedelta, datetime.datetime.fromtimestamp(), datetime.datetime.utcfromtimestamp().
round_half_even_py34.patch: Backport this change to Python 3.4. I tried to write a minimal patch only changing datetime, not pytime.
I wish people wouldn't remove old patches. There's no harm in leaving them, and it may be a useful historical record.
It became hard to read this issue, it has a long history. My old patches for Python 3.4 were for ROUND_HALF_UP, but it was then decided to use ROUND_HALF_EVEN. Technically, patches are not "removed" but "unlinked" from the issue: you can get from the history at the bottom of this page.
Given Victor's reluctance to get this in to 3.5.0, this can't even be marked as a "deferred blocker" anymore. Demoting to normal priority.
Alexander: can you please review attached round_half_even_py34.patch? It's the minimum patch to fix datetime/timedelta rounding for Python 3.4 (and 3.5).
New changeset ee1cf1b188d2 by Victor Stinner in branch '3.4': Issue bpo-23517: Fix rounding in fromtimestamp() and utcfromtimestamp() methods https://hg.python.org/cpython/rev/ee1cf1b188d2
Ok, I fixed Python 3.4 and 3.5 too: fromtimestamp() and utcfromtimestamp() now uses also ROUND_HALF_EVEN rounding mode, as timedelta constructor. I explained in Misc/NEWS that the change was made to respect the property:
(datetime(1970,1,1) + timedelta(seconds=t)) == datetime.utcfromtimestamp(t)
Thanks Alexander Belopolsky & Tim Peters for your explanation on rounding. It's not a simple problem :-)
Thanks Tommaso Barbugli for the bug report: it's now fixed on all (maintained) Python 3 versions: 3.4, 3.5 and 3.6. The fix will be part of Python 3.5.1.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = 'https://github.com/abalkin' closed_at =
created_at =
labels = ['type-bug', 'library']
title = 'datetime.utcfromtimestamp rounds results incorrectly'
updated_at =
user = 'https://bugs.python.org/tbarbugli'
```
bugs.python.org fields:
```python
activity =
actor = 'vstinner'
assignee = 'belopolsky'
closed = True
closed_date =
closer = 'vstinner'
components = ['Library (Lib)']
creation =
creator = 'tbarbugli'
dependencies = []
files = ['40414']
hgrepos = []
issue_num = 23517
keywords = ['patch', '3.3regression']
message_count = 100.0
messages = ['236552', '236553', '236577', '236580', '236581', '236585', '236587', '236589', '236597', '236607', '236608', '236609', '236610', '246049', '246073', '246097', '246104', '246121', '246284', '246306', '246307', '246334', '248942', '249282', '249300', '249301', '249303', '249304', '249307', '249308', '249309', '249519', '249520', '249521', '249525', '249528', '249530', '249532', '249539', '249569', '249611', '249728', '249744', '249771', '249777', '249778', '249779', '249786', '249787', '249788', '249791', '249796', '249798', '249825', '249827', '249834', '249835', '249836', '249841', '249842', '249844', '249845', '249847', '249848', '249849', '249850', '249851', '249853', '249856', '249875', '249879', '249885', '249886', '249898', '249909', '249910', '249925', '249926', '250043', '250047', '250048', '250049', '250066', '250075', '250083', '250101', '250103', '250104', '250107', '250108', '250113', '250259', '250265', '250268', '250269', '250270', '250304', '250405', '250973', '250974']
nosy_count = 13.0
nosy_names = ['tim.peters', 'mark.dickinson', 'belopolsky', 'vstinner', 'larry', 'r.david.murray', 'aconrad', 'BreamoreBoy', 'vivanov', 'python-dev', 'serhiy.storchaka', 'tbarbugli', 'trcarden']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'commit review'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue23517'
versions = ['Python 3.4', 'Python 3.5', 'Python 3.6']
```