Closed rtfeldman closed 3 years ago
Related to the last point of the transition. Is D32
really a good default? Wouldn't it likely be much slower than F32
?
Also, would types auto upcast? Ex: If I have an Int
that happens to be an I32
, if it overflowed, would it just be converted to an I64
.
would types auto upcast? Ex: If I have an
Int
that happens to be anI32
, if it overflowed, would it just be converted to anI64
.
In theory we could do this, but in practice it could create some really frustrating bugs!
Say we have a record that stores an I32
- there are only 32 bits of space there, so we can't upgrade that slot to an I64
. If we're doing a calculation outside that record (e.g. if x + y > z
where x
, y
, and z
are all I32
s, and x
+ y
overflows) we could silently upgrade to I64
and retry the calculation, and that would work fine.
Now let's say I take that working calculation and reorganize where the values are stored (e.g. store the result of x + y
in an I32
record field) - suddenly the I64
upgrade we were relying on can't happen anymore, and the code stops working...even though all I did was reorganize where the values were being stored, which is a code cleanup I (reasonably!) expect to be harmless.
This would be a really frustrating bug to encounter and to debug, and I think it's better if we consistently crash on overflow!
Related to the last point of the transition. Is
D32
really a good default? Wouldn't it likely be much slower thanF32
?
Great question! For context, this should come up very rarely outside the repl. Almost always, if you use a plain number literal, it will end up getting resolved to a concrete numeric type because type inference unifies it with something else.
For example, if I have foo : Bar -> F64
and I write x = 1.0 + foo bar
then x
and 1.0
will both be inferred as F64
s because that's what foo
returns. Another example: any time I write a type alias (e.g. for a record), it'll almost certainly have a concrete numeric type, because otherwise I'd have to give the alias a type variable. Any number literal that gets used with that explicitly-typed field will unify to its concrete numeric type too.
The main way this can come up outside the repl is if I have an expression that's all number literals and none of them have type annotations. For example, if 1 + 2 > 3 then
- in that code, all I have are plain number literals that don't interact with anything that would make their types more specific. Here, we'd default to I32
. If the code instead were if 1.1 + 2.2 > 3.3
we'd default to D32
.
The D32
default should be slower but less error-prone, and I think that trade-off makes sense for an edge case like this. (For example, 1.1 + 2.2 > 3.3
incorrectly returns True
for F32
but correctly returns False
for D32
) The chances that it will come up outside the repl are low in general, and the chances that it will be noticeable (e.g. in a hot loop) are even lower! If it is noticeable, of course, it's easy enough to fix the performance problem as long as the loss of precision is acceptable.
The reason I think the default matters is that I suspect decimals are actually the correct choice for many, many applications. Many problems and untold hours of programmer time have been spent solving problems related to using the default of binary floats for currency, and I think it's worth trying to improve that history of problems by encouraging decimals as a default and then opting into binary floats for performance-critical applications like graphics.
Aside: Super long term, it's conceivable that we could someday live in a world where performance isn't a downside here.
The reason binary floats outperform decimal floats by such a large margin is that they have first-class hardware support. Most chip makers don't bother making decimal floats fast because nobody uses them in software, and a major reason nobody uses them in software is that they aren't fast. The only way this cycle will ever be broken is if people start writing software that uses decimal floats, thus giving hardware makers an incentive to create hardware that will improve performance for existing real software.
Distant future hardware considerations are the exact reason I choose IEEE-754 decimal floats here. It's an IEEE standard and are already (niche IBM) chips that natively support it, so it seems to have the best shot at someday getting widespread hardware support.
I figure there's a nonzero chance that if Roc is successful and uses IEEE-754 decimals, it can at least contribute to a body of evidence that hardware makers should optimize for it. Additionally, if people use it and like it in Roc, other languages might have a higher chance of adding language-level support for it, which in turn would create even more incentive for hardware makers to support it.
This is extremely long-term thinking, but extremely long-term thinking seems to be the only plausible way to end the plague that lack of fast decimals has visited on so many programmers over the years!
will have a branch with the start of this today
Thanks for the detailed explanation.
In theory we could do this, but in practice it could create some really frustrating bugs!
I agree, that is why I asked the targeted question. Would have to add a ton of overhead of type checking and resizing. Would likely be a huge waste.
any time I write a type alias (e.g. for a record), it'll almost certainly have a concrete numeric type, because otherwise I'd have to give the alias a type variable.
Didn't think about that. As long as the choice is given to the user without too much effort, I guess the D32
base is fine. Just need to make sure users realize that if their code is super slow.
Since this is so close now, closing in favor of #1184 and #1185! 🎉
One of the features that's been on the TODO list for awhile is the ability to have multiple numeric sizes in Roc. Long-term, here's the complete goal list:
Long-term Goals
Integer types
Signed:
I128
,I64
,I32
,I32
,I16
,I8
Unsigned:U128
,U64
,U32
,U32
,U16
,U8
,Len
(Len
is what Rust callsusize
, and there's noisize
equivalent in Roc)IEEE-754 Floating-point types
Binary:
F64
,F32
Decimal:D64
,D32
It's not super common knowledge, but the IEEE-754 specification that practically all modern hardware adheres to for its binary floating-point numbers...actually specifies a format for decimal floating-point numbers too! (Decimal as in, for example,
0.1 + 0.2 == 0.3
.) Only a few niche CPUs have instructions for it today, but Intel provides a C implementation licensed under 3-Clause BSD.Wrapper aliases
Currently on
trunk
,Float
is an alias forNum FloatingPoint
.The idea is to change it so that
F64
is an alias forNum (FloatingPoint Binary64)
, andFloat
becomes an alias forNum (FloatingPoint *)
. Likewise,I64
becomes an alias forNum (Integer Signed64)
and so on.In this way, we now have a 3-tiered numeric hierarchy instead of two:
I64
is a concrete type alias forNum (Integer Signed64)
Int
is a type alias with a type variable - forNum (Integer *)
, so a function that acceptsInt *
will acceptI64
,I32
,I16
, etcNum
is also a type alias with a type variable - forNum *
(as it is today), so a function that takesNum *
will accept anyInt *
(includingI64
,I32
, etc), as well as anyFloat *
Float *
encompasses all the numeric types that begin withF
andD
, andInt *
encompasses all the numeric types that begin withI
andU
.Strategy for transitioning there
Her are the steps we can take to get there:
Int
toI64
and currentFloat
toF64
. Get that passing in its own PR and merge it intotrunk
.~F64
to be an alias forNum (FloatingPoint Binary64)
instead ofNum FloatingPoint
like it is today. Likewise, changeI64
to be an alias forNum (Integer Signed64)
.~Float
alias which is an alias forNum (FloatingPoint *)
and a new builtinInt
alias which is an alias forNum (Integer *)
.~F32
as well as the new hardcoded numeric types for integers (U8
,I16
,Len
, and so on). All of these should be doable using existing LLVM intrinsics, and shouldn't require anything special.~Nat
anywhere ausize
would be used in Rust (For example, it should beList.len : List * -> Nat
andList.set : List elem, Nat, elem -> List elem
)~D32
andD64
, which requires introducing the Intel C library to builtin bitcode and calling its functions instead of LLVM intrinsics.I64
toI32
, and the default floating-point type fromF64
toD32
- such that inroc repl
, entering0.1 + 0.2
prints0.3
🤯