[SR-7124] Double-rounding in floating-point literal conversion

stephentyrone commented 6 years ago


Previous ID	SR-7124
Radar	None
Original Reporter	@stephentyrone
Type	Bug

Additional Detail from JIRA

| | | |------------------|-----------------| |Votes | 2 | |Component/s | Compiler, Standard Library | |Labels | Bug | |Assignee | None | |Priority | Medium | md5: 54e8ad8dff55322c12e5604790553880

relates to:

SR-920 Re-design builtin compiler protocols for literal convertible types
SR-3970 Rework ExpressibleBy(Integer|Float)Literal protocols

Issue Description:

Floating-point literals are currently converted first to `_MaxBuiltinFloatType`, and then to the actual type. This rounds twice, which necessarily brings up the possibility of double rounding errors. Here's a concrete example of this in action on x86, where we round first to `Float80`:

1> let x = 1.000000000000000111076512571139929264063539449125528335571

x: Double = 1

`1` is incorrect; the "correct" (single-rounding) double result is `1.0000000000000002` (the literal is just barely larger than the halfway point between the two values, but the initial rounding to `Float80` rounds exactly to the halfway point, so the subsequent rounding rounds down to `1`).

How do we fix this? The usual approach is to make the first rounding happen in a special rounding mode, "round to odd", which rounds any non-exact value to the representable number whose least significant bit is set. This solves the problem for any type that's more than one bit smaller than the intermediate type, but would result in us getting the wrong result if they're the same (then we should be rounding to nearest instead).

We could solve this for concrete built-in types (by far the more important thing to fix) by rounding first to an intermediate type with, say, 32b exponent and 128b significand under the round-to-odd rule (this would also let us implement `Float128` in the future without further changes). This would not solve the problem in general for arbitrary user-defined binary floating-point types, since we don't know how large they may be; we should really be providing them with something like a hexadecimal digit sequence in the longer term.

stephentyrone commented 6 years ago

Note that for efficiency in generic contexts, we might want an arbitrary-width floating-point literal type to stash pre-rounded `Float` and `Double` values.

xwu commented 6 years ago

I was browsing through Python's documentation for its decimal type. It's got this neat DecimalTuple type (sign, exponent, significand) that representations can use. I wonder if it'd be feasible to ape something like that in a revised protocol.

protocol FloatLiteralProtocol {
  associatedtype RawExponent: UnsignedInteger, _ExpressibleByBuiltinIntegerLiteral
  associatedtype RawSignificand: UnsignedInteger, _ExpressibleByBuiltinIntegerLiteral
  static var radix: Int { get }
  var sign: FloatingPointSign { get set }
  var bias: RawExponent { get set }
  /* Can't use constants in generics,
     but the compiler can set the right bias
     based on RawSignificand.bitWidth. */
  var exponentBitPattern: RawExponent { get set }
  var significandBitPattern: RawSignificand { get set }
}

struct BinaryFloatLiteral<T, U>: FloatLiteralProtocol {
  static let radix = 2
  var sign: FloatingPointSign
  var bias: T
  var exponentBitPattern: T
  var significandBitPattern: U
}

struct DecimalFloatLiteral<T, U>: FloatLiteralProtocol {
  static let radix = 10
  var sign: FloatingPointSign
  var bias: T
  var exponentBitPattern: T
  var significandBitPattern: U
}

protocol ExpressibleByFloatLiteral {
  associatedtype FloatLiteral: FloatLiteralProtocol
  init(floatLiteral: FloatLiteral)
}

extension Float: ExpressibleByFloatLiteral {
  init(floatLiteral: BinaryFloatLiteral<UInt, UInt32>) {
    _sanityCheck(floatLiteral.bias == 127)
    init(
      sign: floatLiteral.sign,
      exponentBitPattern: floatLiteral.exponentBitPattern,
      significandBitPattern: floatLiteral.significandBitPattern)
  }
}

stephentyrone commented 6 years ago

I don't see a reason to use a biased exponent for the literal protocols, but broadly speaking something like that could work.

tbkka commented 6 years ago

For floating-point printing, I want to expose a `decimalFloatRepresentation` property that returns a `DecimalFloatRepresentation` struct containing the sign, integer exponent, and base-10 sequence of digits. This struct can then be used for a variety of special-purpose formatting needs. For performance, I had planned to store up to 21 digits inline (avoiding the additional heap allocation required for a general `Array` type.

The same struct could be used as an intermediate form for decimal parsing as well, though that would require the struct to support arbitrary numbers of digits.

Of course, for efficiency the compiler would have to know about the standard types. `1.2e34 as Float` should always compile to the correct constant.

stephentyrone commented 6 years ago

@tbkka One solution I've thought about is that the literal representation could actually be a struct containing the value correctly rounded to Float, Double, Float80, and the full decimal representation for types that need to do their own parsing.

tbkka commented 6 years ago

Another example popped up recently while testing Float printing. The following line of code currently produces different values on different architectures because of this bug:

   let a = 7.038531e-26 as Float

On x86_64 and i386, this produces the correct result because when compiling for those platforms, `_MaxBuiltinFloatType` is Float80 and this particular value happens to round correctly when parsed as Float80 and then truncated to Float.

But on ARM, this assigns a different value to `a` because `_MaxBuiltinFloatType` is Double for ARM targets and this value does not round correctly when parsed as Double and then truncated to Float.

tbkka commented 6 years ago

Pruning this problem back a bit: What we want is for the compiler to parse the string `1.234` appearing in source differently depending on whether it's followed by `as Float` or `as Double` or `as Float80` or `as MySpecialType`.

I think we already have the language machinery to do this: Types that conform to `ExpressibleByFloatLiteral` already have to declare their `FloatLiteralType`, so the compiler "just" needs to be taught to handle float tokens as follows:

Identify the token as a float but don't parse it
Identify the target type
Parse the token according to the associated `FloatLiteralType`
Invoke the initializer with the parsed result

In particular, `Float.FloatLiteralType = Float` would then suffice for all Float literals to be correctly parsed.

To allow greater freedom, we could later extend this to allow `FloatLiteralType = String` for types that really just want to parse the raw token at run time as it appeared in the source.

let a = 1.2345e+123456789 as MyBigExponentFloat

extension MyBigExponentFloat: ExpressibleByFloatLiteral {
   associatedtype FloatLiteralType = String
   init(floatLiteral: String) {
      // floatLiteral == "1.2345e+123456789"
   }
}

This also gracefully handles decimal, hexadecimal, and possibly other float literal representations. Basically, the compiler can parse standard forms for you, other cases can be handled by the custom type however it wishes. Broken-down representations such as the proposed `DecimalFloatRepresentation` can then be simple run-time helpers for these custom initializers; the compiler does not need to treat them specially.

stephentyrone commented 6 years ago

That's not sufficient, for two reasons:

a. If you look at the plumbing backing FloatLiteralType, you'll find that the compiler first converts to the internal `_MaxBuiltinFloatType`, then to whatever the `FloatLiteralType` is.

b. You need a type for the literal in a context generic over `BinaryFloatingPoint`.

stephentyrone commented 6 years ago

If we want to make a quick fix, the best solution is probably something like the following:

public protocol _ExpressibleByBuiltinFloatLiteral {
  init(_builtinFloatLiteral value: _MaxBuiltinFloatType, _roundingInformation: Int)
}

where the extra roundingInformation records if `value` is exact, rounded up, or rounded down.

tbkka commented 6 years ago

... the compiler first converts to the internal `_MaxBuiltinFloatType`, then to whatever the `FloatLiteralType` is....

I think that behavior within the compiler is the root cause of this bug. The compiler should always convert the source token text directly to the final `FloatLiteralType`. Ultimately, that means the compiler needs to hold onto the token text until it actually knows the `FloatLiteralType`.

Put slightly more bluntly: The existence of `_MaxBuiltinFloatType` is a bug.

stephentyrone commented 6 years ago

> Put slightly more bluntly: The existence of `_MaxBuiltinFloatType` is a bug.

Yeah, it's this bug.

tbkka commented 6 years ago

You need a type for the literal in a context generic over `BinaryFloatingPoint`.

I think this might be another bug in the current protocols. Since the `FloatLiteralType` is something that gets acted on by the compiler, the real constraint is that the type must be something into which the compiler can convert a floating-point source token. There's no a priori need for it to conform to any particular numeric type hierarchy.

stephentyrone commented 6 years ago

Totally agree (which is, again, why the general literal type should be [Hexa]decimalCharacterSequence, possibly augmented with some additional information for efficiency).

swiftlang / swift

[SR-7124] Double-rounding in floating-point literal conversion #49672