python / cpython

The Python programming language
https://www.python.org
Other
63.86k stars 30.57k forks source link

'%.2f' % 2.545 doesn't round correctly #49368

Closed 16b6a0fb-4196-4671-a457-3945d50b4ccf closed 15 years ago

16b6a0fb-4196-4671-a457-3945d50b4ccf commented 15 years ago
BPO 5118
Nosy @mdickinson

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = 'https://github.com/mdickinson' closed_at = created_at = labels = ['type-bug', 'invalid'] title = "'%.2f' % 2.545 doesn't round correctly" updated_at = user = 'https://bugs.python.org/Ultrasick' ``` bugs.python.org fields: ```python activity = actor = 'mark.dickinson' assignee = 'mark.dickinson' closed = True closed_date = closer = 'mark.dickinson' components = [] creation = creator = 'Ultrasick' dependencies = [] files = [] hgrepos = [] issue_num = 5118 keywords = [] message_count = 12.0 messages = ['80868', '80869', '80870', '80871', '80873', '80874', '80875', '80876', '158716', '158717', '159853', '159868'] nosy_count = 3.0 nosy_names = ['mark.dickinson', 'Ultrasick', 'Zeev.Rotshtein'] pr_nums = [] priority = 'normal' resolution = 'not a bug' stage = None status = 'closed' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue5118' versions = ['Python 2.7'] ```

16b6a0fb-4196-4671-a457-3945d50b4ccf commented 15 years ago

print '%.2f' % 2.544 // returns 2.54 print '%.2f' % 2.545 // returns 2.54 but should return 2.55 print '%.2f' % 2.546 // returns 2.55

mdickinson commented 15 years ago

This is not a bug; it's a consequence of the finite accuracy of floating- point arithmetic. If you look at the actual value that's stored for '2.545', you'll see that it's actually slightly less than 2.545, so rounding it down is the correct thing to do.

>>> 2.545
2.5449999999999999
16b6a0fb-4196-4671-a457-3945d50b4ccf commented 15 years ago

print round(2.545, 2) // returns 2.55

mdickinson commented 15 years ago

print round(2.545, 2) // returns 2.55

Aha! Yes, that one *is* a bug (see issue bpo-1869), though it's not one that I regard as terribly serious, and not one that can be easily solved in all cases.

Here's why I don't see it as particularly serious: you're rounding a value that's just on the boundary: 2.545+tiny_error should round up, while 2.545-tiny_error should round down. But tiny (or not-so-tiny) errors are an almost unavoidable part of working with binary floating-point arithmetic. Additionally, whether the binary approximation stored for 2.545 is less than or greater than the true value depends on many things (format of a C double, system C library function used for string-to-double conversion, etc.), so in a sense either 2.55 *or* 2.54 can be defended as a valid result, and a good numeric programmer won't write code that depends on getting one or the other.

Having said that, if you're interested in providing a patch for issue bpo-1869 I'd certainly take a look.

If you care about *exact* representations of numbers with a finite number of places after the decimal point, you may be interested in Python's 'decimal' module.

16b6a0fb-4196-4671-a457-3945d50b4ccf commented 15 years ago

I am sorry but I am not a C programmer. I cannot provide any patches.

As far as I understood this issue and issue bpo-1869 have a common problem but this issue wouldn't be solved if issue bpo-1869 is solved. "print '%.2f' % 2.545" doesn't seam to use the built in round() function. Otherwise the result would be 2.55 already as the result of round(2.545, 2) is.

So you might want to reopen the bug. But either way I don't consider this bug as really serious either.

mdickinson commented 15 years ago

So you might want to reopen the bug. But either way I don't consider this bug as really serious either.

I don't understand. As far as I can see '%.2f' % 2.545 is returning the correct result: there is no bug here, so no need to reopen. '%.2f' should *not* return 2.55; it should return 2.54, which is exactly what it does. round(2.545, 2) should also return 2.54, but returns 2.55 instead;
bpo-1869 is already open for this.

You're correct that the float formatting doesn't use round: it does whatever the platform C library's sprintf does.

16b6a0fb-4196-4671-a457-3945d50b4ccf commented 15 years ago

Well that's not what I have learned how rounding works. I think that's the more common way:

0.4 -> 0 0.5 -> 1 0.6 -> 1

I hope you don't try to spread the misbehavoir of pythons way of rounding

print '%.2f' % 2.545 // returns 2.54

to the built in round() function. So that round() would also return 2.54.

The result of rounding 2.545 is 2.55 no matter how python temporarly stores "2.545" and independent of how python does the rounding. The result is 2.55 and not 2.54. If python doesn't deliver "2.55" as the result of it's rounding algorithm then it's doing it wrong. And if python does stuff wrong then it has a bug.

in my opinion

mdickinson commented 15 years ago

result is 2.55 and not 2.54. If python doesn't deliver "2.55" as the result of it's rounding algorithm then it's doing it wrong. And if

Sorry, but that's just not true. I suggest that you (a) read the section on floating-point[1] in the Python tutorial, and/or (b) ask about this on comp.lang.python if you feel inclined---there are plenty of people there who would be glad to explain what's going on here.

[1] http://docs.python.org/tutorial/floatingpoint.html

811f5800-4959-4d05-8529-cc193f1fc7b6 commented 12 years ago

Well this IS a bug. There is a certain globally accepted manner in which rounding work and python does something else.

P.S.: A bug is when something doesn't do what it's supposed to do the way it's supposed to do it. This definition does not depend on "internal representation" or any such things.

mdickinson commented 12 years ago

Well this IS a bug.

I assume that you're referring to behaviour like this:

Python 2.7.2 (default, Jan 13 2012, 17:11:09) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> x = 2.545
>>> round(x, 2)
2.54

To explain again, what happens here is:

(1) After the assignment 'x = 2.545', what's stored for x is not the precise decimal value 2.545, but a binary approximation to it. That binary approximation just happens to be very slightly less than 2.545.

(2) Now when rounding, the usual rules are applies (values less than half get rounded down), to give 2.54.

Which part(s) of the above do you think should be changed? Should the 'round' function incorrectly round some numbers up even though they fall below the halfway case?

16b6a0fb-4196-4671-a457-3945d50b4ccf commented 12 years ago

Ok, let's sum that up:

There is a rounding problem. That's because Python uses floating point registers of the x86-CPU-architecture to store point values. This method is inaccurate and causes such problems. So Python inherits this bug from this value storing method.

Even thou the origin of this bug is in the method which is beeing used, Python has inherited this bug and can't round correctly.

If we would say that Python does not support point values but only floating point values of the x86-CPU-architecture (so not even floating point values in general) then we could admit that round(2.545, 2) has a bug because it "incorrectly" shows 2.55 as the result. But that wouldn't help us any further.

One possible solution would be to use a different method to store point values. For exaple 2 integers could be used to store a point value lossless. The one integer stores whatever is left of the point and the other integer stores whatever is right of the point. Meaning:

25.0: -> integer #1: 0,000,000,025 -> integer #2: 0,000,000,000

25.99997: -> integer #1: 0,000,000,025 -> integer #2: 0,999,970,000

25.00001 -> integer #1: 0,000,000,025 -> integer #2: 0,000,010,000

As you can see, this method is lossless. As long as you don't try to store more than 32 significant bits in a register which is 32 bits in size. To be more accurate: you can't even use all 32 bits because the most significant digit can only be between 0 and 4 (4,294,967,295 barrier).

Using this value storing method would mean quite some efforts for the developers. But then Python would be able to round correctly. So that's why I call it a "possible solution".

I am not the one who is going to make the decision, whether a different value-storing-method is going to be implemented, indepentend how this value storing method may look like. But I am one of thouse who suffered from the method which is currently implemented.

@Mark: And I am also one of thouse who lost a lot of interrest in helping out in the futher development of Python. It was because your haughtiness. You tried to show how perfect Python is and that there would be no bugs. But your last comment was a little more productive. Even thou you still haven't showed much interest in finding a solution to the problem.

@Zeev: I already gave up. But you had more endurance. Thanks :-)

Gary

mdickinson commented 12 years ago

That's because Python uses floating point registers of the x86-CPU- architecture to store point values. This method is inaccurate and causes such problems.

Yes, exactly; this is the root cause.

And as you suggest, Python *could* use a different numeric storage format that doesn't suffer from loss of information when initializing a number from a decimal string. There's an obvious candidate for that storage format, and that's the decimal.Decimal type.

There are some issues, though:

(1) decimal.Decimal operations are implemented in software (in pure Python for versions \<= 3.2, and now in C in Python 3.3) and so are orders of magnitude slower than hardware-supported floats. That's one of the reasons that almost every mainstream programming language uses the binary-represented hardware floats as the main way of representing non-integral numbers. The need for those fast floats isn't going to go away in a hurry. The obvious solution here would be to for Python to support both binary floats and decimal floats, and perhaps to make numeric literals default to being decimal.Decimal instances.

(2) Getting to the point where the Decimal type could be used for numeric literals will be a *long* road, full of backwards compatibility concerns, PEPs, and long and probably contentious python-dev discussions. Python's just taken the first step along that road by reimplementing the decimal module in C for Python 3.3; this improves the speed significantly (though floats are still significantly more efficient in both time and space, and likely will be for a long time), and also makes it easier to start using decimal more widely from within the core of Python.

Reaching that point of having the Decimal type more fully integrated into Python is something that I know a good few of the Python developers are interested in (including me). But it's not going to be an easy or quick change.

@Mark: And I am also one of thouse who lost a lot of interrest in helping out in the futher development of Python. It was because your haughtiness.

I see how my earlier messages came across badly. I apologise for that, and I hope you won't let the poorly chosen words of just one Python developer out of many put you off future involvement in Python.

jowagner commented 2 years ago

Just for the record:

print round(2.545, 2) // returns 2.55

This does not seem to be true (any more?):

$ python2
Python 2.7.18 (default, Mar 04 2021, 23:25:57) [GCC] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> round(2.545, 2)
2.54
>>> 
$ python3
Python 3.6.15 (default, Sep 23 2021, 15:41:43) [GCC] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> round(2.545, 2)
2.54

Edit: https://docs.python.org/3/library/functions.html#round defines a rounding behaviour that Python rounds down here because of the two choices 254 / 100 and 255 / 100 the first one has an even nominator.

mdickinson commented 2 years ago

This does not seem to be true (any more?):

Indeed, current CPython does correct rounding. 2.545 gets rounded down not because it's a halfway case (it's not), but because the actual value stored (which is 2.5449999999999999289457264239899814128875732421875) is a smidgen closer to 2.54 than to 2.55.