JanCVanB commented 2 years ago

664 proposes the following very-helpful builtins:

toI8   : Int * -> I8
toI16  : Int * -> I16
toI32  : Int * -> I32
toI64  : Int * -> I64
toI128 : Int * -> I128
toU8   : Int * -> U8
toU16  : Int * -> U16
toU32  : Int * -> U32
toU64  : Int * -> U64
toU128 : Int * -> U128
toF32  : Num * -> F32
toF64  : Num * -> F64

However, I think their return values should be Results with Errs like [ TooLow, TooHigh ]*:

toI8   : Int * -> Result I8   [ TooLow, TooHigh ]*
toI16  : Int * -> Result I16  [ TooLow, TooHigh ]*
toI32  : Int * -> Result I32  [ TooLow, TooHigh ]*
toI64  : Int * -> Result I64  [ TooLow, TooHigh ]*
toI128 : Int * -> Result I128 [ TooHigh ]*
toU8   : Int * -> Result U8   [ TooLow, TooHigh ]*
toU16  : Int * -> Result U16  [ TooLow, TooHigh ]*
toU32  : Int * -> Result U32  [ TooLow, TooHigh ]*
toU64  : Int * -> Result U64  [ TooLow, TooHigh ]*
toU128 : Int * -> Result U128 [ TooLow ]*
toF32  : Num * -> Result F32  [ TooLow, TooHigh ]*
toF64  : Num * -> Result F64  [ TooLow, TooHigh ]*

@rtfeldman Do you agree? Should the Errs be simplified to something like[ OutOfBounds ]*?

Additionally, how should these be implemented?

Compose the Num.toStr implementation with the Str.to{Num *} implementation (This seems inefficient, but it would work)
Attempt to cast between numeric types and catch any error (With .into()?)
Another approach

@ayazhafiz Thank you for your help on my previous numeric builtin work. What do you think about this implementation question?

ayazhafiz commented 2 years ago

I don't know how I feel about the ergonomics of e.g. toI128 : Int * -> Result I128 [ TooHigh ]*. toI28 should succeed with any integer type I give it except U128; for the rest I feel most users would dislike having to explicitly handle an error that can never happen.

However, creating toI128FromU8 and all such varieties is probably a bad idea because of the combinatorial explosion and the difficulty of reading those names. Maybe this is a good candidate for "magic" casts like Rust's n as i128 or zig's @intCast (the later isn't that special, zig just lets you pass around types in compile-time value positions)

Implementation wise,

Unfortunately I think to/from strings is far too inefficient; in any case you would still have to do a bounds check, which is the trickiest part of this
I don't know how casts with bounds checking are implemented elsewhere, but section 4.2 of https://www.cs.utah.edu/~regehr/papers/tosem15.pdf seems very promising. We could also take a look at the assembly output of e.g. clang or zig doing a cast with bounds checking, and replicate how they do it. Happy to engage in this with you if you'd like

rtfeldman commented 2 years ago

Interesting design question!

One of the design principles I have for the lowest-level primitive operations is that ideally there should be a way to get Roc to emit any CPU instruction except in cases where doing so would be memory-unsafe. That's one of the reasons we have Num.addWrap, for example.

This principle is important because it increases Roc's performance ceiling - the set of programs that are possible to implement in pure Roc with sufficient performance that rewriting them in C wouldn't yield a significant performance improvement. (It might require some unusual Roc function calls to get there, but that's okay.) This in turn means that it's possible to have more things in the package ecosystem that work across all platforms, and where if someone says "I could get a faster version out of C; why don't you add native support for it in the platform via a Task?" the response can be "it's possible to get a pure Roc version to run that fast; why don't we invest time in porting that instead? Then we'll have better guarantees and tooling support for it, it'll work across all platforms, and it won't need Task!"

So I think it's important to have it be possible to do these casting conversions without a Result for performance reasons. If you're really sure it's safe to go from I64 to I32, for example, there's no way to avoid a conditional if that operation always returns Result.

That said, I think it's reasonable to discuss having a Result-based API as well, similar to Rust's try_into(), but I think we should definitely have the ones without Result at a minimum!

JanCVanB commented 2 years ago

Brainstorm of potential variants:

Num.toI8 : ???
Num.toI8Safe : Result I8 [ TooLow, TooHigh ]*
Num.toI8Clamp : I8 (clamps/rounds out-of-bounds inputs to nearest min/max) (alt. names: .toI8Clip, .toI8Round, .toI8Limit)
Num.toI8Wrap : I8 (wraps/modulos like Num.addWrap)

I think that having all of these available could be nice. I don't know which variant .toI8 should alias, if any.

(Side note, idk if Num.clamp : Num * -> Num * would be a useful builtin by itself, since it's so easily implemented in pure Roc.)

ayazhafiz commented 2 years ago

I think there is value to having both Num.toI8 and Num.toI8Safe options.

What would be a use case for toI8Wrap?

JanCVanB commented 2 years ago

One use case (that I selfishly want for my RNG lib) is for scaling an evenly-distributed set of larger integers (I64s, for example) to a similarly-distributed set of smaller integers (I8s, for example).

getRandomI64 : {} -> I64
convertToI8 : I64 -> I8
getRandomI8 = \_ -> {} |> getRandomI64 |> convertToI8

With only Num.toI8Safe and Num.toI8Clamp, getRandomI8 would have to either:

😕 manually implement a wrapping mapping for numbers outside of [-127, 128],
☹️ throw away most of the numbers (most of the randomness) provided by getRandomI64, or
😆 return -128 & 127 very often.

JanCVanB commented 2 years ago

On the other hand, .to{Num *}Wrap only completely solves the above use case's needs if the conversion is to a pure subset. A distribution mapping like U64 -> I8 would require custom wrapping/mapping logic, and an inflation like I8 -> U64 would require even more. Therefore, I can't think of a roc solid use case for .to{Num *}Wrap. We could always add them later if demand developed.

rtfeldman commented 2 years ago

A distribution mapping like U64 -> I8

I might be missing something, but wouldn't casting to I8 give us equivalent randomness characteristics to wrapping or a distribution mapping?

Suppose we have 64 bits consisting of sufficiently randomly generated ones and zeroes. Casting to 8 bits would mean saying "let's take the final 8 ones and zeroes and ignore the first 56." But if those 8 bits were - like the other 56 out of 64 bits - randomly generated, what more could we ask for in terms of randomness for those 8 bits? 🤔

JanCVanB commented 2 years ago

I hadn't event considered having a Roc builtin for manual casting of bits/bytes, but that would absolutely solve that problem! (In fact, the leftmost 8 bits would be preferable to cast. They're more random than the right bits, because math.) I assumed that the unsafe casting functions we've discussed so far were all trying to fit the actual original value into the new container, without just chopping bits, which would never give us negative numbers when going from unsigned to signed.

JanCVanB commented 2 years ago

If the unsafe casting was just doing bit chopping/padding, then that RNG use case could simply combine that with the existing bitshifting builtins for maximum fun.

JanCVanB commented 2 years ago

(And now I realize the subtler point you're making, which is that wrapping numbers between bitsizes is equivalent to chopping/padding bits 🤦)

JanCVanB commented 2 years ago

Proposal A:

Num.to{Int a} : Int * -> {Int a} for bits-based casting
Num.to{Int a}Safe : Int * -> Result {Int a} [ TooLow, TooHigh ] for value-based casting

Examples:

Num.toI8 : Int * -> I8
Num.toU128Safe : Int * -> Result U128 [ TooLow ]*

I think any clamping is possible with a separately-implemented clamp builtin like Num.clamp : Num a, Num b, Num b -> Num a

I'm not sure what to do about Floats, as I don't know how they're represented in bits.

rtfeldman commented 2 years ago

having a Roc builtin for manual casting of bits/bytes

That's actually what I think these Num.toWhatever functions should do! 😃

Casting is the fastest way for a CPU to convert from one integer size to another (in the ideal case it's zero instructions!) so it's definitely a primitive operation I think should be available.

the leftmost 8 bits would be preferable to cast

This can be accomplished by using Num.shlWrap or Num.shrWrap to wrap them around to the start, and then casting!

rtfeldman commented 2 years ago

Num.to{Int a}Safe : Int * -> Result {Int a} [ TooLow, TooHigh ]*

🤔 I wonder if having the TooLow and TooHigh distinction is worth the performance cost of the extra conditional it would take to make that distinction. How often would people do something different based on whether this returned TooLow versus TooHigh, as opposed to returning something like [ OutOfBounds ]*?

I also wonder if there's a better name than Safe for these. It's accurate in terms of how Num.toI8Safe is safer than Num.toI8, but it doesn't give any clues about what it's doing differently (e.g. how it's achieving that safety). Something I appreciate about Num.addWrap is that it explains what it's actually doing. Rust uses checked for this, so for example it could be Num.toI8Checked - which tells you that it's doing extra checking, so you can guess from the name that there's a performance cost.

JanCVanB commented 2 years ago

I was wondering those myself.

[ OutOfBounds ] (closed is better here?) is a fine first step, and if it proves awkward then we can upgrade it later.

Checked is better than Safe. I hope to think of an even-better suffix, but there may be none.

JanCVanB commented 2 years ago

Implementation-wise, how will we check the cast without checking both bounds? I imagine there's a Rust casting error we can catch?

rtfeldman commented 2 years ago

The CPU doesn't have a primitive for checking bounds, so we'd just have a conditional to check them both. I just realized this doesn't matter though.

Without going on a huge tangent, my thought process about this:

It's generally faster to do if a || b { ... } else { ... } than if a { ... } else if b { ... } else { ... } because, unless the CPU's branch predictor consistently correctly guesses which branch will be taken (which would be unlikely here), the extra if will be a lot more expensive than the extra ||
So although we still have to check both bounds, if num < min || num > max { out_of_bounds } else { ok } should run faster than if num < min { too_low } else if num > max { too_high } else { ok }
However, this happens to be a case where the ifs should optimize to a cmov, so there's no chance of a branch misprediction penalty, and it'll likely be approximately the same performance either way.

So never mind on the performance...although I still kinda prefer defaulting to OutOfBounds if we don't know of a use case for TooHigh and TooLow, just because it's generally easier to add flexibility later if there's sufficient demand for it - compared to removing flexibility after the fact.

rtfeldman commented 2 years ago

[ OutOfBounds ] (closed is better here?)

As a general rule, open is better for error propagation because the accumulation gives you the option of doing all your error handling in one big when after chaining together several operations that can fail (most likely either Task or Result).

So I'd go [ OutOfBounds ]* here, since it's in the error position of the Result.

ayazhafiz commented 2 years ago

I think we may also be able to rely on LLVM to do the heavy lifting to optimize conditionals away here. Rustc uses the if a||b {} else {} pattern and this is just a register copy and single conditional on the bit level (https://godbolt.org/z/jYq9vo7c5; to convert m:i64 to i32 with bounds checking, move the lower 32 bits of m to tmp1, check if tmp1 == m extended to 64 bits, and condition on that result).

JanCVanB commented 2 years ago

What should we do about Floats?

Allow F64 -> I8? (Would that include an implicit flooring or is there some bitwise weirdness here?)
Force users to Num.round/ceil/floor/trunc in between?

Edit: And what about Decs and Nats? I'm just curious if we want to change the input types to be Num * instead of Int *. I think not.

rtfeldman commented 2 years ago

Yeah I think when going from fractional types to integers, it's clearest if the function names spell out how they'll be rounded off.

Nat should definitely work with the normal integer operations though!

rtfeldman commented 2 years ago

These are great questions btw, thanks for thinking through all the edge cases! Talking through them all is giving me a lot more confidence in the design we're moving towards.

JanCVanB commented 2 years ago

Proposal B:

Num.to{Int a} : Int * -> {Int a} for bits-based casting
Num.to{Int a}Checked : Int * -> Result {Int a} [ OutOfBounds ]* for value-based casting

Examples:

Num.toI8 : Int * -> I8
Num.toU128Checked : Int * -> Result U128 [ OutOfBounds ]*

This will require 20 new builtin functions (5 size 2 signage 2 variants).

rtfeldman commented 2 years ago

Proposal B seems good to me! Maybe ask on Zulip to see if anyone has any thoughts to pitch in?

roc-lang / roc

`Num.to{Num *}` builtins (API/implementation design) #2411

664 proposes the following very-helpful builtins: