Open stephentyrone opened 6 years ago
Note that for efficiency in generic contexts, we might want an arbitrary-width floating-point literal type to stash pre-rounded `Float` and `Double` values.
I was browsing through Python's documentation for its decimal type. It's got this neat DecimalTuple type (sign, exponent, significand) that representations can use. I wonder if it'd be feasible to ape something like that in a revised protocol.
protocol FloatLiteralProtocol {
associatedtype RawExponent: UnsignedInteger, _ExpressibleByBuiltinIntegerLiteral
associatedtype RawSignificand: UnsignedInteger, _ExpressibleByBuiltinIntegerLiteral
static var radix: Int { get }
var sign: FloatingPointSign { get set }
var bias: RawExponent { get set }
/* Can't use constants in generics,
but the compiler can set the right bias
based on RawSignificand.bitWidth. */
var exponentBitPattern: RawExponent { get set }
var significandBitPattern: RawSignificand { get set }
}
struct BinaryFloatLiteral<T, U>: FloatLiteralProtocol {
static let radix = 2
var sign: FloatingPointSign
var bias: T
var exponentBitPattern: T
var significandBitPattern: U
}
struct DecimalFloatLiteral<T, U>: FloatLiteralProtocol {
static let radix = 10
var sign: FloatingPointSign
var bias: T
var exponentBitPattern: T
var significandBitPattern: U
}
protocol ExpressibleByFloatLiteral {
associatedtype FloatLiteral: FloatLiteralProtocol
init(floatLiteral: FloatLiteral)
}
extension Float: ExpressibleByFloatLiteral {
init(floatLiteral: BinaryFloatLiteral<UInt, UInt32>) {
_sanityCheck(floatLiteral.bias == 127)
init(
sign: floatLiteral.sign,
exponentBitPattern: floatLiteral.exponentBitPattern,
significandBitPattern: floatLiteral.significandBitPattern)
}
}
I don't see a reason to use a biased exponent for the literal protocols, but broadly speaking something like that could work.
For floating-point printing, I want to expose a `decimalFloatRepresentation` property that returns a `DecimalFloatRepresentation` struct containing the sign, integer exponent, and base-10 sequence of digits. This struct can then be used for a variety of special-purpose formatting needs. For performance, I had planned to store up to 21 digits inline (avoiding the additional heap allocation required for a general `Array` type.
The same struct could be used as an intermediate form for decimal parsing as well, though that would require the struct to support arbitrary numbers of digits.
Of course, for efficiency the compiler would have to know about the standard types. `1.2e34 as Float` should always compile to the correct constant.
@tbkka One solution I've thought about is that the literal representation could actually be a struct containing the value correctly rounded to Float, Double, Float80, and the full decimal representation for types that need to do their own parsing.
Another example popped up recently while testing Float printing. The following line of code currently produces different values on different architectures because of this bug:
let a = 7.038531e-26 as Float
On x86_64 and i386, this produces the correct result because when compiling for those platforms, `_MaxBuiltinFloatType` is Float80 and this particular value happens to round correctly when parsed as Float80 and then truncated to Float.
But on ARM, this assigns a different value to `a` because `_MaxBuiltinFloatType` is Double for ARM targets and this value does not round correctly when parsed as Double and then truncated to Float.
Pruning this problem back a bit: What we want is for the compiler to parse the string `1.234` appearing in source differently depending on whether it's followed by `as Float` or `as Double` or `as Float80` or `as MySpecialType`.
I think we already have the language machinery to do this: Types that conform to `ExpressibleByFloatLiteral` already have to declare their `FloatLiteralType`, so the compiler "just" needs to be taught to handle float tokens as follows:
Identify the token as a float but don't parse it
Identify the target type
Parse the token according to the associated `FloatLiteralType`
Invoke the initializer with the parsed result
In particular, `Float.FloatLiteralType = Float` would then suffice for all Float literals to be correctly parsed.
To allow greater freedom, we could later extend this to allow `FloatLiteralType = String` for types that really just want to parse the raw token at run time as it appeared in the source.
let a = 1.2345e+123456789 as MyBigExponentFloat
extension MyBigExponentFloat: ExpressibleByFloatLiteral {
associatedtype FloatLiteralType = String
init(floatLiteral: String) {
// floatLiteral == "1.2345e+123456789"
}
}
This also gracefully handles decimal, hexadecimal, and possibly other float literal representations. Basically, the compiler can parse standard forms for you, other cases can be handled by the custom type however it wishes. Broken-down representations such as the proposed `DecimalFloatRepresentation` can then be simple run-time helpers for these custom initializers; the compiler does not need to treat them specially.
That's not sufficient, for two reasons:
a. If you look at the plumbing backing FloatLiteralType, you'll find that the compiler first converts to the internal `_MaxBuiltinFloatType`, then to whatever the `FloatLiteralType` is.
b. You need a type for the literal in a context generic over `BinaryFloatingPoint`.
If we want to make a quick fix, the best solution is probably something like the following:
public protocol _ExpressibleByBuiltinFloatLiteral {
init(_builtinFloatLiteral value: _MaxBuiltinFloatType, _roundingInformation: Int)
}
where the extra roundingInformation records if `value` is exact, rounded up, or rounded down.
... the compiler first converts to the internal `_MaxBuiltinFloatType`, then to whatever the `FloatLiteralType` is....
I think that behavior within the compiler is the root cause of this bug. The compiler should always convert the source token text directly to the final `FloatLiteralType`. Ultimately, that means the compiler needs to hold onto the token text until it actually knows the `FloatLiteralType`.
Put slightly more bluntly: The existence of `_MaxBuiltinFloatType` is a bug.
> Put slightly more bluntly: The existence of `_MaxBuiltinFloatType` is a bug.
Yeah, it's this bug.
You need a type for the literal in a context generic over `BinaryFloatingPoint`.
I think this might be another bug in the current protocols. Since the `FloatLiteralType` is something that gets acted on by the compiler, the real constraint is that the type must be something into which the compiler can convert a floating-point source token. There's no a priori need for it to conform to any particular numeric type hierarchy.
Totally agree (which is, again, why the general literal type should be [Hexa]decimalCharacterSequence, possibly augmented with some additional information for efficiency).
Additional Detail from JIRA
| | | |------------------|-----------------| |Votes | 2 | |Component/s | Compiler, Standard Library | |Labels | Bug | |Assignee | None | |Priority | Medium | md5: 54e8ad8dff55322c12e5604790553880relates to:
Issue Description:
Floating-point literals are currently converted first to `_MaxBuiltinFloatType`, and then to the actual type. This rounds twice, which necessarily brings up the possibility of double rounding errors. Here's a concrete example of this in action on x86, where we round first to `Float80`:
1> let x = 1.000000000000000111076512571139929264063539449125528335571
x: Double = 1
`1` is incorrect; the "correct" (single-rounding) double result is `1.0000000000000002` (the literal is just barely larger than the halfway point between the two values, but the initial rounding to `Float80` rounds exactly to the halfway point, so the subsequent rounding rounds down to `1`).
How do we fix this? The usual approach is to make the first rounding happen in a special rounding mode, "round to odd", which rounds any non-exact value to the representable number whose least significant bit is set. This solves the problem for any type that's more than one bit smaller than the intermediate type, but would result in us getting the wrong result if they're the same (then we should be rounding to nearest instead).
We could solve this for concrete built-in types (by far the more important thing to fix) by rounding first to an intermediate type with, say, 32b exponent and 128b significand under the round-to-odd rule (this would also let us implement `Float128` in the future without further changes). This would not solve the problem in general for arbitrary user-defined binary floating-point types, since we don't know how large they may be; we should really be providing them with something like a hexadecimal digit sequence in the longer term.