unitsofmeasurement / indriya

JSR 385 - Reference Implementation
Other
115 stars 40 forks source link

Floating point precision not preserved when converting floats #329

Closed daniel-shuy closed 3 years ago

daniel-shuy commented 3 years ago

Indriya retains floating point precision when converting doubles, but not floats. eg.

System.out.println(Quantities.getQuantity(3.524, Units.GRAM).to(Units.KILOGRAM).getValue());

prints 0.003524, but

System.out.println(Quantities.getQuantity(3.524f, Units.GRAM).to(Units.KILOGRAM).getValue());

prints 0.0035239999294281006.

andi-huber commented 3 years ago

Hi @daniel-shuy - why is that an issue?

daniel-shuy commented 3 years ago

That's because the precision is not preserved. It should return 0.003524, like how it behaves with doubles.

DefaultNumberSystem uses BigDecimal internally to preserve floating point precision when converting doubles. It attempts to do the same for float, by converting to double. Unfortunately Float#doubleValue() does not preserve floating point precision.

I've created a PR (#331) to fix this.

keilw commented 3 years ago

@daniel-shuy according to https://jcp.org/en/participation/members/S you don't seem to be a JCP member, would you be willing to join because otherwise we cannot merge this PR?

andi-huber commented 3 years ago

@daniel-shuy - by reading into your PR, I can see you are converting a float (32bit IEEE 754) into its rounded decimal representation (thats the toString conversion) and then parse it with a BigDecimal, which represents floating point numbers in decimal notation. With this step you are already loosing precision. I believe you are chasing a red herring here. I cannot see any flaws in our current NumberSystem implementation.

If you think there is a bug, please demonstrate the issue with a test case.

andi-huber commented 3 years ago

I know that's a bit of a puzzler but eg. 0.1f or even 0.1d in Java does not represent the exact number 0.1, meaning (one tenth in decimal representation)!

daniel-shuy commented 3 years ago

@andi-huber well to be more exact, we actually don't want to gain precision (eg. we want 3.524 / 1000 to return 0.003524, not 0.0035239999294281006), let me correct my PR description.

You can test the behavior with this:

var number = Float.valueOf(3.524f);
System.out.println(BigDecimal.valueOf(number.doubleValue()));  // 3.5239999294281006
System.out.println(new BigDecimal(number.toString()));  // 3.524

But good point, I'll add some test cases.

keilw commented 3 years ago

+1 it would be nice to back a changed behavior (but also the existing one) with more JUnit tests. While the test coverage of the API project is around 80% (could still be better but even at corporate clients that is usually decent) Indriya only got 60% so far, thus it could be better wherever people are willing to help.

andi-huber commented 3 years ago

My humble advice: please have a short excursion into floating point number representation according to IEEE 754, before writing any supposedly fixing code, which we are not going to merge.

keilw commented 3 years ago

@andi-huber Also added you as reviewer for the PR, if there are functional problems please discuss it there, I think maybe it could be better to call it a Draft PR for now, WDTY? Changed to draft so it's not accidentially merged. Thanks also @desruisseaux for the additional input. If either of you feel the PR was unnecessary, feel free to close (at least @desruisseaux should be able to close it, not sure about Associate members but for most part I guess every contributor has the same right to close issues or PRs)

desruisseaux commented 3 years ago

Converting an IEEE 754 number with toString(), then parsing as a BigDecimal is the strategy applied by the JDK in BigDecimal.valueOf(double) method. That method is currently implemented as below:

    public static BigDecimal valueOf(double val) {
        return new BigDecimal(Double.toString(val));
    }

So replacing BigDecimal.valueOf(number.doubleValue()) by new BigDecimal(number.toString()) would give the same results and take in account the various types of Number (long, float, double, etc.). Note in particular that current Indriya implementation losts precision for the long type too (conversion from long to double is not always lossless), while the toString() representation avoid that.

Currently it is not even necessary to do a (number instanceof Double) check. Unless BigDecimal.valueOf(double) implementation is changed in future, just using unconditionally the toString() approach will work as well.

daniel-shuy commented 3 years ago

So replacing BigDecimal.valueOf(number.doubleValue()) by new BigDecimal(number.toString()) would give the same results

@desruisseaux unfortunately, when you pass a Float to BigDecimal.valueOf(double), the Float is unboxed and cast to a double, which has the same problem as Float#doubleValue():

public double doubleValue() {
    return (double)value;
}

Try running this code and you'll see what I mean:

var number = Float.valueOf(3.524f);
System.out.println(BigDecimal.valueOf(number));  // 3.5239999294281006
System.out.println(new BigDecimal(number.toString()));  // 3.524
desruisseaux commented 3 years ago

@daniel-shuy yes I know, that sentence was specifically about the Double case (it was not clear in my comment). I was trying to said that if the code use the toString() representation of Double type, there is no difference with BigDecimal.valueOf(double) current implementation. Consequently there is no need for the if (number instanceof Double) special case.

daniel-shuy commented 3 years ago

@desruisseaux ah sorry I misunderstood, that's a good point, that will greatly simplify the code, thanks!

desruisseaux commented 3 years ago

No problem. As a side note (but not something needed for this particular case), below is a code for converting a float to a double as if using the toString() base 10 representation, but more efficient:

https://github.com/apache/sis/blob/1.0/core/sis-utility/src/main/java/org/apache/sis/math/DecimalFunctions.java#L142

keilw commented 3 years ago

I made both @desruisseaux and @andi-huber reviewers for the PR, please check it out and comment or suggest changes (or close if you really think it wasn't appropriate) From a process point we should have proof, @daniel-shuy requested to become an Associate JCP Member or his name already showing in https://jcp.org/en/participation/members/S.

daniel-shuy commented 3 years ago

Thanks @keilw. I'll try to apply to become a JCP Member as soon as possible.

daniel-shuy commented 3 years ago

A better example:

System.out.println(Quantities.getQuantity(4.1f, Units.METRE).to(MetrixPrefix.CENTI(Units.METRE)).getValue())

prints 409.999990463256800 instead of 410

andi-huber commented 3 years ago

The key question I guess then is, as we eventually have to convert a float (binary representation) to decimal representation which method is best. As it seems, we are discussing 2 methods:

  1. 'widen' the float to double, then for the double find the nearest decimal representation with an upper bound A of available decimal places (current implementation)
  2. for the float find the nearest decimal representation with an upper bound B of available decimal places (suggested change)

If I understand correctly A is 19 decimal digits, whereas B is only 9 digits. However for the 'widening' of float to double, all binary places we are winning are set to null, which makes the difference between method (1) and (2) less dramatic. But still, its not clear to me why we would want to prefer method (2).

desruisseaux commented 3 years ago

When widening from float to double, the extra bits in IEEE 754 representation are set to 0. But this is an arbitrary choice; those bits could be set to anything, the reality is that we ignore their values. Consequently if Float.toString() formats about 9 digits and Double.toString() formats about 19 digits, then all the last 10 digits are just noise (for a number widened from float to double). I don't think that it matters if those 10 digits make us closer to the IEEE 754 representation with last bits arbitrarily set to zero; it still does not contain any real information.

Float.toString() and Double.toString() are designed for formatting the minimal number of digits necessary. So the advantage of parsing the string representation of those numbers is that BigDecimal gets only significant digits, without noise. The inconvenient of BigDecimal.valueOf(double) where the double is a widened value is that it creates a false sense of precision.

I agree that formatting a number as a string and parsing it back looks like an ugly hack. But actually a more mathematically elegant solution would be very complicated.

daniel-shuy commented 3 years ago

@andi-huber I may have misunderstood the original purpose BigDecimal was used for double arithmetic in DefaultNumberSystem. I assumed it was to preserve precision between conversions, but it seems like it was only intended to prevent precision loss, and the benefit of preventing precision gain between conversions was an unintended side-effect.

If that is the case, then it is indeed not a bug, but an enhancement. While the change is not strictly necessary - I can simply format the value output with DecimalFormat, I would suggest applying the change because it is currently a little unintuitive.

Because of the current behavior, I initially assumed that the precision is kept between unit conversions. Eg. when converting 1.56 m to cm, it would return 156 cm, and I would not have to format the value output. Imagine my surprise when my colleague told me that the unit conversions were returning "imprecise" values for him! After much trial and error only did we realize that it was only occuring for float, but not double, which really confused us.

If you all do decide not to merge the PR, I hope that at least the documentation can be updated to reflect that (maybe Quantity#to(Unit)?), to help others that may stumble on this issue in the future.

andi-huber commented 3 years ago

Hi @desruisseaux - I agree, that setting the additional bits to 0 when doing the widening part is an arbitrary choice, unless the float number actually is an exact representation of the corresponding number! In which case the widening is not just noise, but the correct extension. Granted, we can discuss how relevant that is in practice, but still there is a case in my opinion to keep it that way.

keilw commented 3 years ago

Both double and float are equally inprecise, according to folks including Brian Goetz: http://www.javapractices.com/topic/TopicAction.do?Id=213 What about performance, is there a huge penalty with the string operations and parsing? And would it only be restricted to explicitly passing a Float or 4.1f primitive value?

desruisseaux commented 3 years ago

@andi-huber : yes you are right. But this is an information that only the user know.

@keilw : saying that both float and double are imprecise needs more context. They are precise in base 2, the imprecision come from our desire to get numbers formatted in base 10, which is a purely human cultural choice.

About performance, yes there is a penalty. But BigDecimal.valueOf(double) is already paying that penalty, so the change proposed by Daniel would have no impact on this aspect.

keilw commented 3 years ago

If it's mainly for values passed as float, then it might be OK. For performance purists they may not like to use Quantity and rather use UnitConverter (which as of now also just supports double or Number) The difference of double and float when it comes to memory consumption is not as relevant now as it may have been on Java ME Embedded.

andi-huber commented 3 years ago

I believe this is not a discussion about memory footprint or performance. When a calculation needs widening of types to BigDecimal, we do this regardless, of the initial value (float or double) that was passed into the Quantity factory method. Purists of any kind will have to resort to other means anyway.

I'd rather say the discussion is about which of the 2 methods (from above) we want to implement. I do see the benefits of the suggested changes (method (2), eg. less confusion for the user; smaller decimal number representation, as intended by the user anyway), but I'm also concerned about breaking a few corner cases, that's all.