Closed JanCVanB closed 2 years ago
I don't know how I feel about the ergonomics of e.g. toI128 : Int * -> Result I128 [ TooHigh ]*
. toI28
should succeed with any integer type I give it except U128
; for the rest I feel most users would dislike having to explicitly handle an error that can never happen.
However, creating toI128FromU8
and all such varieties is probably a bad idea because of the combinatorial explosion and the difficulty of reading those names. Maybe this is a good candidate for "magic" casts like Rust's n as i128
or zig's @intCast
(the later isn't that special, zig just lets you pass around types in compile-time value positions)
Implementation wise,
Interesting design question!
One of the design principles I have for the lowest-level primitive operations is that ideally there should be a way to get Roc to emit any CPU instruction except in cases where doing so would be memory-unsafe. That's one of the reasons we have Num.addWrap
, for example.
This principle is important because it increases Roc's performance ceiling - the set of programs that are possible to implement in pure Roc with sufficient performance that rewriting them in C wouldn't yield a significant performance improvement. (It might require some unusual Roc function calls to get there, but that's okay.) This in turn means that it's possible to have more things in the package ecosystem that work across all platforms, and where if someone says "I could get a faster version out of C; why don't you add native support for it in the platform via a Task
?" the response can be "it's possible to get a pure Roc version to run that fast; why don't we invest time in porting that instead? Then we'll have better guarantees and tooling support for it, it'll work across all platforms, and it won't need Task
!"
So I think it's important to have it be possible to do these casting conversions without a Result
for performance reasons. If you're really sure it's safe to go from I64
to I32
, for example, there's no way to avoid a conditional if that operation always returns Result
.
That said, I think it's reasonable to discuss having a Result
-based API as well, similar to Rust's try_into()
, but I think we should definitely have the ones without Result
at a minimum!
Brainstorm of potential variants:
Num.toI8 : ???
Num.toI8Safe : Result I8 [ TooLow, TooHigh ]*
Num.toI8Clamp : I8
(clamps/rounds out-of-bounds inputs to nearest min/max) (alt. names: .toI8Clip
, .toI8Round
, .toI8Limit
)Num.toI8Wrap : I8
(wraps/modulos like Num.addWrap
)I think that having all of these available could be nice. I don't know which variant .toI8
should alias, if any.
(Side note, idk if Num.clamp : Num * -> Num *
would be a useful builtin by itself, since it's so easily implemented in pure Roc.)
I think there is value to having both Num.toI8
and Num.toI8Safe
options.
What would be a use case for toI8Wrap
?
One use case (that I selfishly want for my RNG lib) is for scaling an evenly-distributed set of larger integers (I64
s, for example) to a similarly-distributed set of smaller integers (I8
s, for example).
getRandomI64 : {} -> I64
convertToI8 : I64 -> I8
getRandomI8 = \_ -> {} |> getRandomI64 |> convertToI8
With only Num.toI8Safe
and Num.toI8Clamp
, getRandomI8
would have to either:
[-127, 128]
,getRandomI64
, or-128
& 127
very often.On the other hand, .to{Num *}Wrap
only completely solves the above use case's needs if the conversion is to a pure subset. A distribution mapping like U64 -> I8
would require custom wrapping/mapping logic, and an inflation like I8 -> U64
would require even more. Therefore, I can't think of a roc solid use case for .to{Num *}Wrap
. We could always add them later if demand developed.
A distribution mapping like
U64 -> I8
I might be missing something, but wouldn't casting to I8
give us equivalent randomness characteristics to wrapping or a distribution mapping?
Suppose we have 64 bits consisting of sufficiently randomly generated ones and zeroes. Casting to 8 bits would mean saying "let's take the final 8 ones and zeroes and ignore the first 56." But if those 8 bits were - like the other 56 out of 64 bits - randomly generated, what more could we ask for in terms of randomness for those 8 bits? đ¤
I hadn't event considered having a Roc builtin for manual casting of bits/bytes, but that would absolutely solve that problem! (In fact, the leftmost 8 bits would be preferable to cast. They're more random than the right bits, because math.) I assumed that the unsafe casting functions we've discussed so far were all trying to fit the actual original value into the new container, without just chopping bits, which would never give us negative numbers when going from unsigned to signed.
If the unsafe casting was just doing bit chopping/padding, then that RNG use case could simply combine that with the existing bitshifting builtins for maximum fun.
(And now I realize the subtler point you're making, which is that wrapping numbers between bitsizes is equivalent to chopping/padding bits đ¤Ļ)
Proposal A:
Num.to{Int a} : Int * -> {Int a}
for bits-based castingNum.to{Int a}Safe : Int * -> Result {Int a} [ TooLow, TooHigh ]
for value-based castingExamples:
Num.toI8 : Int * -> I8
Num.toU128Safe : Int * -> Result U128 [ TooLow ]*
I think any clamping is possible with a separately-implemented clamp builtin like Num.clamp : Num a, Num b, Num b -> Num a
I'm not sure what to do about Float
s, as I don't know how they're represented in bits.
having a Roc builtin for manual casting of bits/bytes
That's actually what I think these Num.toWhatever
functions should do! đ
Casting is the fastest way for a CPU to convert from one integer size to another (in the ideal case it's zero instructions!) so it's definitely a primitive operation I think should be available.
the leftmost 8 bits would be preferable to cast
This can be accomplished by using Num.shlWrap
or Num.shrWrap
to wrap them around to the start, and then casting!
Num.to{Int a}Safe : Int * -> Result {Int a} [ TooLow, TooHigh ]*
đ¤ I wonder if having the TooLow
and TooHigh
distinction is worth the performance cost of the extra conditional it would take to make that distinction. How often would people do something different based on whether this returned TooLow
versus TooHigh
, as opposed to returning something like [ OutOfBounds ]*
?
I also wonder if there's a better name than Safe
for these. It's accurate in terms of how Num.toI8Safe
is safer than Num.toI8
, but it doesn't give any clues about what it's doing differently (e.g. how it's achieving that safety). Something I appreciate about Num.addWrap
is that it explains what it's actually doing. Rust uses checked
for this, so for example it could be Num.toI8Checked
- which tells you that it's doing extra checking, so you can guess from the name that there's a performance cost.
I was wondering those myself.
[ OutOfBounds ]
(closed is better here?) is a fine first step, and if it proves awkward then we can upgrade it later.
Checked
is better than Safe
. I hope to think of an even-better suffix, but there may be none.
Implementation-wise, how will we check the cast without checking both bounds? I imagine there's a Rust casting error we can catch?
The CPU doesn't have a primitive for checking bounds, so we'd just have a conditional to check them both. I just realized this doesn't matter though.
Without going on a huge tangent, my thought process about this:
if a || b { ... } else { ... }
than if a { ... } else if b { ... } else { ... }
because, unless the CPU's branch predictor consistently correctly guesses which branch will be taken (which would be unlikely here), the extra if
will be a lot more expensive than the extra ||
if num < min || num > max { out_of_bounds } else { ok }
should run faster than if num < min { too_low } else if num > max { too_high } else { ok }
if
s should optimize to a cmov
, so there's no chance of a branch misprediction penalty, and it'll likely be approximately the same performance either way.So never mind on the performance...although I still kinda prefer defaulting to OutOfBounds
if we don't know of a use case for TooHigh
and TooLow
, just because it's generally easier to add flexibility later if there's sufficient demand for it - compared to removing flexibility after the fact.
[ OutOfBounds ]
(closed is better here?)
As a general rule, open is better for error propagation because the accumulation gives you the option of doing all your error handling in one big when
after chaining together several operations that can fail (most likely either Task
or Result
).
So I'd go [ OutOfBounds ]*
here, since it's in the error position of the Result
.
I think we may also be able to rely on LLVM to do the heavy lifting to optimize conditionals away here. Rustc uses the if a||b {} else {}
pattern and this is just a register copy and single conditional on the bit level (https://godbolt.org/z/jYq9vo7c5; to convert m:i64
to i32
with bounds checking, move the lower 32 bits of m to tmp1
, check if tmp1 == m
extended to 64 bits, and condition on that result).
What should we do about Float
s?
F64 -> I8
? (Would that include an implicit flooring or is there some bitwise weirdness here?)Num.round
/ceil
/floor
/trunc
in between?Edit: And what about Dec
s and Nat
s? I'm just curious if we want to change the input types to be Num *
instead of Int *
. I think not.
Yeah I think when going from fractional types to integers, it's clearest if the function names spell out how they'll be rounded off.
Nat
should definitely work with the normal integer operations though!
These are great questions btw, thanks for thinking through all the edge cases! Talking through them all is giving me a lot more confidence in the design we're moving towards.
Proposal B:
Num.to{Int a} : Int * -> {Int a}
for bits-based castingNum.to{Int a}Checked : Int * -> Result {Int a} [ OutOfBounds ]*
for value-based castingExamples:
Num.toI8 : Int * -> I8
Num.toU128Checked : Int * -> Result U128 [ OutOfBounds ]*
This will require 20 new builtin functions (5 size 2 signage 2 variants).
Proposal B seems good to me! Maybe ask on Zulip to see if anyone has any thoughts to pitch in?
664 proposes the following very-helpful builtins:
However, I think their return values should be
Result
s withErr
s like[ TooLow, TooHigh ]*
:@rtfeldman Do you agree? Should the
Err
s be simplified to something like[ OutOfBounds ]*
?Additionally, how should these be implemented?
Num.toStr
implementation with theStr.to{Num *}
implementation (This seems inefficient, but it would work).into()
?)@ayazhafiz Thank you for your help on my previous numeric builtin work. What do you think about this implementation question?