rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
96.45k stars 12.45k forks source link

f32 formatter produces strange results #63171

Open emilio opened 5 years ago

emilio commented 5 years ago

Test-case:

fn main() {
    let f = 2147483648.0f32;
    println!("{} | {}", f, f as f64)
}

I expected both to produce the same number, since that number is a perfectly representable f32 value.

This bit me because I was investigating an issue where values close to integer limits were treated as negative.

Seeing 2147483600 (in range for an i32) rather than the actual value 2147483648 (out of range for an i32) made it all really confusing.

Anyhow, I'll fix Firefox to do the proper bounds check, but it would've been nice to avoid the extra confusion time :)

nagisa commented 5 years ago

The bytes put into the binary are correct, so it is indeed an issue in formatting routines.

ExpHP commented 5 years ago

This is because the floating point formatter prefers rounder numbers as long as it roundtrips back to the same float. For large floating point numbers, a more accurate string representation can be obtained by adding a precision (even a precision of 0 forces the value to be precise up to the decimal point).

fn main () {
    println!("{}", 1e100);
    println!("{:.0}", 1e100);
}
10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
10000000000000000159028911097599180468360808563945281389781327557747838772170381060813469985856815104

I seem to recall that, when this feature was added to rust, the author of the PR described it as an "optimization." (as in, simply printing zeros is much faster than drilling down into the decimal representation of the float; though at the time I wasn't sure why anyone would want to optimize the printing of such large numbers!) (Nope, I can't find anything like that. I think I recalled wrong.)

emilio commented 5 years ago

I think that's very confusing behavior, fwiw.

ExpHP commented 5 years ago

It's basically applying the same logic to digits before the decimal as it does to those after. The issue is, rust actually shows all the digits before the decimal.

I am wishing more and more that this could just get formatted to 2.1474836e9.

scottmcm commented 5 years ago

a more accurate string representation can be obtained

nitpick: a more precise string representation, not a more accurate one.

workingjubilee commented 3 years ago

@rustbot modify labels: +A-floating-point, +A-fmt

KodrAus commented 3 years ago

We discussed this in the recent Libs meeting and would be open to try fixing this up. It is confusing behavior to treat digits before the decimal the same as those after.

workingjubilee commented 3 years ago

@KodrAus:

It is confusing behavior to treat digits before the decimal the same as those after.

Respectfully, I do not agree that whether the digits are before or after the decimal is meaningful: I think the only thing that is meaningful is accurately displaying the number of significant digits. Once the decimal formatter reaches the number of significant digits the source float can be said to meaningfully represent, all others should be zero, as to print anything else is to give the false impression that displaying such values represents a different value from 0. Never mind optimization: this is standard mathematical, scientific, and statistical convention.

Thus, I sympathize with ExpHP. The way to represent this, and incidentally also a valid formatting convention for C and/or Rust, is the XeY style of scientific notation. This is the way it is handled in many everyday calculators, whether they use IEEE754 floats or not.

And IEEE754-2019 actually specifies a floating point implementation may have a limitation on the number of digits that can be correctly rounded when handling such binary-decimal conversions, and that limit (called "H") should be fairly high by default. Specifically, for what we call f32, it shall be at least 12. So here, it would have completely specified things, and raising our H would incidentally solve this specific problem. But after that limit is reached, the standard specifies the implementation should just print 0s.

H should be unbounded, according to IEEE754, but it also specifies that what H functionally represents ("significant digits plus a bit of padding") is a parameter to the formatting operation which can be specified in a language-dependent way. So I believe a Rust programmer should always be able to somehow request that H be considered ∞ and that, by default, when outputting floating point we should use the minimum limit of H, because past that, as scottmcm noted, we only gain precision, not accuracy.

Diggsey commented 3 years ago

Once the decimal formatter reaches the number of significant digits the source float can be said to meaningfully represent, all others should be zero, as to print anything else is to give the false impression that displaying such values represents a different value from 0

This is incorrect. Firstly, innaccuracies in floating point values are caused by conversions and/or computations on those values. A value on its own is completely precise. The value 2147483648 is exactly representable in a 32-bit float, and it is a different number from 2147483600, which is not representable in a 32-bit float.

Never mind optimization: this is standard mathematical, scientific, and statistical convention.

It is standard when you are doing computation on uncertain values. If I add two values of 3 SF each, of course it makes no sense to show the result to more than 3 SF, because in this case I am not trying to represent a single number, I'm representing a (simplified) distribution.

The f32 type is a much lower-level primitive: the language has no idea how a particular f32 value was arrived at, or whether that process introduced any loss of precision.

The way to represent this, and incidentally also a valid formatting convention for C and/or Rust, is the XeY style of scientific notation

I think this is fine once numbers get larger than is reasonable to display the usual way, but when they are displayed in decimal, they should not be rounded in this way.

workingjubilee commented 3 years ago

This is incorrect. Firstly, innaccuracies in floating point values are caused by conversions and/or computations on those values. A value on its own is completely precise. The value 2147483648 is exactly representable in a 32-bit float, and it is a different number from 2147483600, which is not representable in a 32-bit float.

Your example is a 10 digit decimal, and thus needs to be printed fully according to the standard, as I said. I am very interested if you have a 13 digit example.

workingjubilee commented 3 years ago

The point of what I said is that by the time the value may be printed imprecisely then the distance between the values, the points that correspond to the integer solution to the sign, exponent, and mantissa equation, has become considerably greater than the value that may be omitted. Permitting such for values within the range of an equivalent integer seems unwise. Happily, it is not allowed in IEEE754 standard, which absolutely bars printing an f32 equal to or less than 12 decimal digits with anything but a completely precisely calculated printing. But beyond that, printing the last decimal digit thus is actually functionally moot at those scales. Again, it only matters at the minimum of 13 decimal digits for an f32, not the example you cited.

The assertion that floating point numbers are precise until computed with is awkward, because all floating point numbers are subject to the rounding function, including when parsed from decimals in source. That IS a computation that is subject to rounding rules. Thus all the decimal sequences that would be rounded to a given float value, and the one that it happens to "exactly" correspond with, are not actually discernible, which is the point. If we wished we could offer such facilities in Rust to discern such slightly rounded values apart from precise ones and indeed that is extensively recommended by IEEE, but that is injecting an additional data point.

I presume if someone directly specifies a floating point value from its bit format using from_bits, then they know what they are doing. That is the only instance I can think of where a floating point number is not computed. Notably, however, you cannot fit 13 decimal digits inside a u32 anyways.

Diggsey commented 3 years ago

I am very interested if you have a 13 digit example.

Sure, any power of 2 within the exponent range is exactly representable in an f32. eg. 1_099_511_627_776.0.

the integer solution to the sign, exponent, and mantissa equation, has become considerably greater than the value that may be omitted

The problem is that setting it to zero is not the same as omitting the value. Omitting the value would indicate that the remaining digits are unspecified. Setting them to zero just produces the wrong number.

But beyond that, printing the last decimal digit thus is actually functionally moot at those scales.

I disagree: printing the wrong decimal digit is just wrong, and we don't have the context to argue that it's a moot point, since that depends on the specific program.

all floating point numbers are subject to the rounding function, including when parsed from decimals in source

Converting from decimal is a calculation that can introduce error, but not all numbers are parsed from decimals, and those that are may be completely within the range where all whole numbers can be represented.

If we wished we could offer such facilities in Rust to discern such slightly rounded values apart from precise ones and indeed that is extensively recommended by IEEE, but that is injecting an additional data point.

I'm not quite sure what you're saying here: do you mean add a function to determine in advance if a value would be rounded to a power of ten when formatted? I don't think that really addresses the issue of the value being formatted actually being wrong.

Just to give some concrete examples:

WiSaGaN commented 2 years ago

Is there any progress on this? I just encounter this, which seems to be a f64 version of the issue?

fn main() {
    let f = 1655640002809605600.0f64;
    println!("{}", f);
    println!("{:.0}", f);
    println!("{:.20}", f);
}

produces

1655640002809605600
1655640002809605632
1655640002809605632.00000000000000000000

This is very confusing behavior imo because I used Display to debug where I already expected a floating point precision issue, but got a wrong expression that f can be precisely represented. This can be avoided if we just make plain that f cannot be represented precisely by displaying the actual number.

gliderkite commented 1 year ago

FWIW I also encountered this while debugging the use of a 3rd party game library that deals with f32, the behavior is quite confusing indeed (rustc 1.65.0).

let x: f32 = 0.25 - 5000000.0;
println!("{:.2}", x); // -5000000.00

Casting or binding to a new f64 variable doesn't help either:

println!("{:.2}", x as f64); // -5000000.00
wwylele commented 1 year ago

@gliderkite Yours is different from this issue. f32 cannot distinguishably store -5000000.0 and -4999999.75

fn main() {
    let x: f32 = 0.25 - 5000000.0;
    let y: f32 = -5000000.0;
    assert_eq!(x, y); // will pass
}

(this applies to all languages that use the same IEEE float 32)

Your idea of using f64 is good, but since this is not a formatting problem, you shouldn't apply it at formatting code, but use all f64 from the beginning

fn main() {
    let x: f64 = 0.25 - 5000000.0;
    let y: f64 = -5000000.0;
    assert_eq!(x, y); // this will panic
}
EricPostpischil commented 1 year ago

I ratify Diggsey’s comments. As specified by IEEE 754, each floating-point datum that is not a NaN or infinity represents one number exactly. Thus, printing more non-zero digits gains accuracy, not just precision: The exact number represented by a floating-point datum is uniquely the one represented in decimal by showing all significant digits (all digits from the first non-zero digit to the last non-zero digit).

IEEE 754 says that floating-point arithmetic approximates real arithmetic (IEEE 754-2019 3.2, first sentence). When any operation is performed, except where stated otherwise in the standard, it is as if the result were computed exactly and then rounded to a value representable in the destination format (according to rounding rules described in the standard). So floating-point arithmetic approximates real-number arithmetic, but each result obtained is a specific number. This is crucial to the floating-point model:

Converting numbers to decimal (in a string of human-readable characters) is an operation that should be computed as described above, and the ideal result is to produce the exact result representable in the destination format. Allowing an implementation limit on how many significant digits can be computed correctly was largely a nod to feasibility, not an inherently desirable feature.

In light of this model, we can consider several forms of conversion:

For a conversion of binary floating-point to an output format of “decimal numeral,” the input number is always exactly representable in the output format, so that should be the result.

For a conversion of binary floating-point to an output format of “decimal numeral of n digits,” the output should be the n-digit decimal obtained by rounding the input number using the selected rounding rule. (This includes output formats of “decimal numeral with up to n digits” where the final number of digits is determined by removing trailing insignficant zeros, as these result in the same output numbers as "decimal numeral of n digits,” just with a different representation.)

Another conversion operation is to convert a binary floating-point number to a decimal numeral with just enough digits to uniquely identify it among the numbers representable in the input format. This is not well described as a conversion to a specific output format due to the complexity of the output set.

When we choose to display numbers in a form convenient for humans to interpret, the format choices are best left to the application. The core floating-point work ought to be done as accurately as the format(s) permit, and choices about abbreviating numbers (and introducing more rounding error) should be left to the application program to decide for its purposes.

Many of the frustrations people have with floating-point arithmetic can be attributed at least in part to displays of decimal values that differ from the actually represented value. Displaying “0.1” instead of “0.100000001490116119384765625” misleads the reader about what is happening in the program.

Note that inaccuracies in floating-point arithmetic cannot be a reason for underlying software (such as the programming language or a general string conversion/display library) to limit how many digits are displayed. For example, sometimes people reason that a floating-point format is only “accurate” to 15 digits, so only 15 digits should be displayed. This is not a correct criterion because the software converting to decimal has no information information about how accurate its input operand is or is not. It may be a number that is exact, because it was received exactly and no operations were performed on it or because it is the result of specialized calculations. Alternatively, the number might be the result of a long sequence of operations involving many roundings. The error induced by rounding in each operation can compound or cancel, and the final result of a sequence of floating-point operations can be exactly correct or can be wrong by an unbounded amount, in which case even the first digit might be incorrect. If a conversion routine were to limit the digits it produced to only those it knew to be correct, it could never produce any digits, since it knows nothing about how many digits are correct.

So the fact that floating-point arithmetic introduces inaccuracies provides no basis for determining how many digits ought to be displayed. This burden lies with the application programmer; numerical analysis of the potential error requires knowledge of the algorithms and data. To this end, the underlying software can provide features for rounding numbers to a requested number of digits, but it should not decide to limit the digits itself.