Closed d849f282-d84e-4503-9303-511dfb81a5bd closed 6 years ago
import sys
print(sys.version)
for i in range(10):
test=i+0.5
print (test,round(test,0))
3.6.3 |Anaconda custom (64-bit)| (default, Nov 8 2017, 15:10:56) [MSC v.1900 64 bit (AMD64)] 0.5 0.0 1.5 2.0 2.5 2.0 3.5 4.0 4.5 4.0 5.5 6.0 6.5 6.0 7.5 8.0 8.5 8.0 9.5 10.0 ------------------------- 2.7.14 (v2.7.14:84471935ed, Sep 16 2017, 20:19:30) [MSC v.1500 32 bit (Intel)] (0.5, 1.0) (1.5, 2.0) (2.5, 3.0) (3.5, 4.0) (4.5, 5.0) (5.5, 6.0) (6.5, 7.0) (7.5, 8.0) (8.5, 9.0) (9.5, 10.0)
import sys
print(sys.version)
for i in range(10):
test=i+0.5
print (test,round(test,0))
3.6.4 (v3.6.4:d48eceb, Dec 19 2017, 06:04:45) [MSC v.1900 32 bit (Intel)]
0.5 0.0 1.5 2.0 2.5 2.0 3.5 4.0 4.5 4.0 5.5 6.0 6.5 6.0 7.5 8.0 8.5 8.0 9.5 10.0 3.6.3 |Anaconda custom (64-bit)| (default, Nov 8 2017, 15:10:56) [MSC v.1900 64 bit (AMD64)] 0.5 0.0 1.5 2.0 2.5 2.0 3.5 4.0 4.5 4.0 5.5 6.0 6.5 6.0 7.5 8.0 8.5 8.0 9.5 10.0 ------------------------- 2.7.14 (v2.7.14:84471935ed, Sep 16 2017, 20:19:30) [MSC v.1500 32 bit (Intel)] (0.5, 1.0) (1.5, 2.0) (2.5, 3.0) (3.5, 4.0) (4.5, 5.0) (5.5, 6.0) (6.5, 7.0) (7.5, 8.0) (8.5, 9.0) (9.5, 10.0)
It works as documented.
""" For the built-in types supporting round(), values are rounded to the closest multiple of 10 to the power minus ndigits; if two multiples are equally close, rounding is done toward the even choice (so, for example, both round(0.5) and round(-0.5) are 0, and round(1.5) is 2). """ https://docs.python.org/3/library/functions.html#round
This was so surprising to me that I had to check some other languages that I had handy. It turns out that not one of JavaScript, Ruby, Perl, C++, Java, Go, or Rust agrees with Python. In fact they all agreed with one another that 2.5 should round to 3. Examples below.
I understand from https://github.com/cosmologicon/pywat/pull/40#discussion_r219962259 that "to always round up... can theoretically skew the data" but it's not clear why that's a good enough reason to differ from the "round" function in all these other languages (as opposed to e.g. offering this alternative behavior in some additional "round_unskewed" function).
I assume the rationale for having Python 3's "round" differ from that of so many other languages was written down when this decision was made, but I searched and couldn't find it. Could anyone link to it in a comment here?
And would it be worth including rationale and a larger callout in the https://docs.python.org/3/library/functions.html#round docs? The documentation of this behavior is a bit buried among other things, and the rationale for it is missing entirely.
$ node -e 'console.log(Math.round(2.5))'
3
$ ruby -e 'puts (2.5).round()'
3
$ perl -e 'use Math::Round; print round(2.5)'
3
$ cat test_round.cpp
#include <stdio.h>
#include <math.h>
int main(void) {
printf("%f\n", round(2.5));
}
$ g++ test_round.cpp && ./a.out
3.000000
$ cat TestRound.java
class TestRound {
public static void main(String[] args) {
System.out.println(Math.round(2.5));
}
}
$ javac TestRound.java && java TestRound
3
$ cat test_round.go
package main
import "fmt"
import "math"
func main() {
fmt.Println(math.Round(2.5))
}
$ go build test_round.go && ./test_round
3
$ cat test_round.rs
fn main() {
println!("{}", (2.5_f64).round());
}
$ rustc test_round.rs && ./test_round
3
See the discussion on the Python-Dev mailing list: https://mail.python.org/pipermail/python-dev/2008-January/075863.html.
For C look at the rint() function. It is a common knowledge that rounding half-to-even is what users want in most cases, but it is a tiny bit more expensive in C. In Python the additional cost of such rounding is insignificant.
Thanks Serhiy, I read the Python-Dev thread you linked to, but that doesn't resolve the issues:
Its topic is Python 2.6 (where this behavior does not occur) rather than Python 3 (where it does).
A few messages into the thread Guido does address Python 3, but in fact says "I think the consensus is against round-to-even in 3.0" (see https://mail.python.org/pipermail/python-dev/2008-January/075897.html).
There is no discussion of the fact that this behavior differs from the function named "round" in all the other programming languages I mentioned, and whether it would therefore be better exposed as an additional function (e.g. "round_to_even" or "round_unbiased", and in the math or statistics package rather than builtins). Surprisingly, Excel is the only other programming environment I saw discussed in the thread. (And round(2.5) == 3 there.)
So that all suggests there must be some other thread or issue where this change for Python 3 have been discussed, but I looked again and could not find it.
The C "rint" example you gave just seems to prove the point that this behavior should have a distinct name from "round".
Regarding:
It is a common knowledge that rounding half-to-even is what users want in most cases
I don't think that's common knowledge; seems like citation needed? Based on all the other languages where this differs (not to mention Python 2), it's not clear users would want Python 3 to be the only different one. And this is definitely a surprise for the majority of programmers, whose experience with "round" is how it works everywhere else. (This is making it into pywat after all: https://github.com/cosmologicon/pywat/pull/40)
I can submit a PR for at least updating the docs about this (as per my previous comment) if that would be welcomed.
Rounding floating point numbers is a complex topic :-) Here are my notes on that topic: https://vstinner.readthedocs.io/misc.html#rounding It took me years to fix how Python rounds timestamps :-)
@jab: Did you see this message from the thread that Serhiy pointed to? https://mail.python.org/pipermail/python-dev/2008-January/075910.html
Here's the sound bite version, but see the message for context:
[Guido]
Wow. Do you have an opinion as to whether we should adopt round-to-even at all (as a default)?
[Tim] Yes: yes :-) [...]
Note that Python is not alone here: .NET also uses round-ties-to-even as the default rounding mode. Note also that the languages you mention don't all agree with one another either: try rounding -2.5 to the nearest integer in JavaScript and see what result you get.
[Joshua]
and whether it would therefore be better exposed as an additional function (e.g. "round_to_even" or "round_unbiased"
Changing the default behaviour of round
again seems highly undesirable to me; I'd be strongly opposed to such a change. Proposals for backwards-compatible changes that make the round-ties-away-from-zero function available again under a different name (or via a keyword argument to round
) could be worth discussing. And suggestions/PRs for documentation clarification are welcome, of course.
Adding Tim Peters to the nosy list, so that he can tell me that I'm misrepresenting him ...
Thanks, Mark. Yes, I saw where Tim said round-half-even should be the default, but I didn't see any proposal to expose it as e.g. math.round_half_even() instead, nor a more complete look at what other languages do. That, along with the subject being 2.6 and not 3, made me think this change in Python 3 must have been discussed more fully elsewhere. Was it not?
And I agree -- nowhere have I been proposing changing "round" again. My proposals have been:
It'd also be interesting to hear if this is something we wish we'd done differently now, but that shouldn't distract from 1, 2, and 3.
I spent a few minutes with git blame/checkout/show and so far have found https://bugs.python.org/issue1869 (via https://github.com/python/cpython/commit/e6a076d). Still reading -- looks like there were a number of different changes made to round() at the same time for various reasons -- so maybe changing from round_half_up to round_half_even was necessary for the other improvements, and it couldn't have been exposed as a separate function? Or maybe that was just never proposed?
FWIW, I hadn't noticed this change in py3, but it would never have occurred to me that it's controversial... I thought everyone who'd studied the issue agreed that round-half-to-even was the best default :-). Numpy has always done round-to-even, and it's the default rounding mode mandated by IEEE754. (This doesn't mean that they mandate that round() itself follow this rule, but the implicit rounding that all floating point operations do has to follow this rule.) See also: https://en.wikipedia.org/wiki/Rounding#Round_half_to_even
[Joshua]
- Update the round() docs to make the documentation of this behavior less buried,
Sounds reasonable to me; I'm definitely open to documentation improvements. Though it doesn't seem all that buried to me: the round-ties-to-even behaviour is described in the third sentence in the first place I'd look for round documentation (https://docs.python.org/3/library/functions.html#round). It would be misleading to move the information earlier, because the use of round-ties-to-even is specific to the builtin types: user-defined types can do whatever they like via the __round__ magic method.
- include a (brief) justification (possibly even just a link to http://wiki.c2.com/?BankersRounding or some more-authoritative document), and
Sure, a link to a source on bankers rounding could work.
- link to where else this change in Python 3 was discussed more, if anywhere, or else confirm this change was made based on no additional analysis that we can find written down.
I'm not aware of much discussion beyond the thread that Serhiy already pointed to. There's a little bit more (but not much) on rounding the py3k mailing list (try a Google search for "site:mail.python.org/pipermail/python-3000 rounding").
It'd also be interesting to hear if this is something we wish we'd done differently now, but that shouldn't distract from 1, 2, and 3.
I can't speak for anyone else, but it's certainly not something I think should have been done differently, with one caveat: the silent and subtle change in behaviour from Python 2 to Python 3 was a bit unpleasant, and a possible source of late-discovered (or undiscovered) bugs.
so maybe changing from round_half_up to round_half_even was necessary for the other improvements [...]
No. The change was independent of other fixes and changes. There _is quite a history of round changes: fixes for the single-argument round function in odd corner cases (earlier versions of Python used the simple add-half-and-chop algorithm, with gives the wrong answer for 0.4999999999999999 and for 4503599627370497.0 thanks to FPU-level rounding in the add-half step); making two-argument round correctly-rounded in all cases in Python 2.7 and 3.1 via the same dtoa.c machinery used for str\<->float conversions; changing the return type of single-argument round in Python 3; making round generic via the \_round__ magic method, etc. But none of these required the change in rounding mode.
We need to recognise that there are various different contexts where the idea of "rounding" comes into play in a general-purpose language. Some examples:
For 1., there's decades of numerical evidence that round-ties-to-even is what you want to do, and that's why IEEE 754 makes it the default rounding mode, and why it's the rounding mode you're likely to be using for numeric work out of the box in any mainstream language. [For one demonstration of where the unbiasedness of round-ties-to-even can matter, see https://stackoverflow.com/a/45245802/270986. Apologies for linking to my own answer here, but it was easily accessible. I'm sure there are many better demonstrations out there.]
Case 2 is really a special case of 1. Though not (usually) FPU-supported: you can think of conversion from decimal string to binary floating-point as another primitive floating-point operation, and it's one that's covered by IEEE 754; round-ties-to-even (or at least, some precision- or algorithm-limited _approximation_ to round-ties-to-even) is again a common default across languages and operating systems.
Case 3 is also covered by IEEE 754, and I believe that "most" languages use round-ties-to-even here, too. C's fprintf (for example) specifies that e-style, f-style, and g-style formatting should be "correctly rounded" (C99 7.19.6.1p13), where "correctly rounded" means "[...] nearest in value, subject to the current rounding mode [...]" (C99 3.9); in practice, that's usually round-ties-to-even. Java's DecimalFormat uses round-ties-to-even by default (source: https://docs.oracle.com/javase/7/docs/api/java/text/DecimalFormat.html). I haven't checked other languages, but I expect that many of them do something similar.
Cases 4 and 5 are mostly what we're arguing about in this issue. It's much less clear to me that the numerical benefits are significant at this level (compared to FPU-level last-bit-rounding, where those benefits are really unarguable). But note that these cases are really just floatified versions of case 3. Indeed, Python 3's current two-argument round algorithm is based directly on the string conversion code used for string formatting. And the use of round-ties-to-even for case 3 is already well established (and was already established long before Python 3.)
What happens for these 5 cases in Java? It _looks_ to me as though the first three cases use round-ties-to-even, the fourth uses round-ties-to-away by default, and the last isn't directly supported by the language. (But it's been a long time since I dabbled in Java.)
Like I said, I'm not totally convinced about the numerical benefits of round-ties-to-even for user-level round-to-n-decimal-places operations as opposed to FPU-level rounding (though I'm open to persuasion). That's partly because round-to-two-decimal-places (for example) is actually quite a peculiar operation to be doing on a binary float in the first place, and in practice ties don't really appear or affect the behaviour that often. (It might *look* as though you have a value "2.675" in your dataframe, but on a typical machine that value is actually being stored as "2.67499999999999982236431605997495353221893310546875", so it doesn't matter one whit whether you're using round-ties-to-even or round-ties-to-away: under correct rounding, both are going to give you the surprising result of 2.67 when you round to two decimal places).
What I really like about Python's choice is the consistency. In Python, since Python 3, all five cases of rounding described above use round-ties-to-even. In Python 2, float formatting used round-ties-to-even (most of the time in practice, though for Python 2.6 and earlier the exact behaviour depended on the system), while "round" used round-ties-to-away for a very closely-related operation, and there are bug reports and StackOverflow questions from users surprised by the discrepancy between float formatting and two-argument round. In Python 3, we have the pleasant situation that "round" and string formatting agree.
the silent and subtle change in behaviour from Python 2 to Python 3 was a bit unpleasant, and a possible source of late-discovered (or undiscovered) bugs.
This change was advertised in the "What’s New In Python 3.0" document.
This change was advertised in the "What’s New In Python 3.0" document.
Not originally, it wasn't. That information was added later, by me. :-) [1]
In any case, I wasn't referring to documentation. I meant "silent" in the runtime-sense: there was no run-time error or warning associated with the behaviour change (in contrast to the print
function change, for example). Few people read the "what's new" documents, and someone who's written Python 2 code using round has no particular reason to suspect a behaviour change, and no visible indication that such a change has occurred, when porting their code to Python 3. Nevertheless, their numerical results may be slightly different as a result of the porting. Hence the potential for late-discovered bugs.
[1] https://github.com/python/cpython/commit/f4112e2653f310b6e6a210c659648dbe45d3a042
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at =
created_at =
labels = ['type-bug', 'invalid']
title = 'python 3 round bug'
updated_at =
user = 'https://bugs.python.org/MJH'
```
bugs.python.org fields:
```python
activity =
actor = 'njs'
assignee = 'none'
closed = True
closed_date =
closer = 'serhiy.storchaka'
components = []
creation =
creator = 'MJH'
dependencies = []
files = []
hgrepos = []
issue_num = 32956
keywords = []
message_count = 15.0
messages = ['312931', '312933', '312935', '326311', '326325', '326441', '326444', '326447', '326449', '326453', '326460', '326523', '326581', '326593', '326613']
nosy_count = 8.0
nosy_names = ['tim.peters', 'mark.dickinson', 'vstinner', 'jyasskin', 'ezio.melotti', 'jab', 'serhiy.storchaka', 'MJH']
pr_nums = []
priority = 'normal'
resolution = 'not a bug'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue32956'
versions = ['Python 3.4', 'Python 3.5', 'Python 3.6']
```