Open dolio opened 1 year ago
Update on this, we did a more detailed design review and came up with a better solution, Idea 3 below. Benchmarks confirm this is just as fast as using the native Nat
and Int
functions.
Including full discussion here for posterity.
We'd like to decide how to represent Nat
and Int
in the Unison runtime, what the spec should be for these types going forward, and if we're changing anything about the spec, how do we roll this out with minimal impact to users?
This issue came up because the current JIT implementation doesn't have a good way of representing unboxed integers and natural numbers at full 64-bit width. Instead, both pointers and unboxed ints / nats occupy the same slots at runtime. Since pointers are byte-aligned, on a 64-bit architecture this leaves three bits which are always 0 which can store additional data to distinguish pointers from unboxed numbers. (See tagged pointer) Racket's integer type uses this to have fast unboxed arithmetic, with values getting promoted to arbitrary precision when they exceed the 61 bits.
This implementation choice is not uncommon; many runtimes use pointer tagging as it's simple to implement and has good performance. (The V8 JS runtime does, and even the Haskell spec for Int
actually only requires "at least 30 bits of precision", specifically to allow for this sort of thing. GHC ended up going in a different direction.)
Notably, it works out nicely for higher-order code, even without fancy optimizations, which makes it a nice choice for functional languages where higher-order code is common. For instance, if you pass a f : Nat -> Nat
to List.map
, there's no boxing and unboxing happening on the calls to f
while looping over the list. In contrast, in Haskell or Scala, you'd be reliant on f
getting inlined and a worker-wrapper optimization to get this into a form where the arithmetic could happen unboxed. Unboxed arithmetic is faster and doesn't do needlessly memory allocation.
All else being equal, these are some good properties, in no particular order. There are tradeoffs among several of these desiderata:
Int
/ Nat
type as we do now which performs well, rather than multiple types floating around that people use inconsistently.Again, these desiderata are "all else being equal". There may not be an approach that fully satisfies all of the above.
Nat
and Int
, separate Natural
and Integer
typesThe idea here: Nat
and Int
continue to be full 64-bit width and have the exact same semantics as today. The JIT will have to use boxed arithmetic to provide these types. Dan has done a microbenchmark with and reported this will be about 4-5x slower than using the unboxed primitives (this was with a version with fxvector
with two fixnums in it - a version using bignums for everything + masking / mod was about 8x slower).
Idea 1a: alongside this change, introduce Natural
and Integer
builtin types, backed in the JIT by the fancy tagged pointers that support fast unboxed arithmetic. On the JIT, these will be more performant and don't require worrying about overflow. In the Haskell runtime, these can be backed by the usual boxed arbitrary precision numbers.
Rollout plan:
Natural
or Integer
since performance will be better on the JIT. But unless it were done in a coordinated way (in "dependency order", which feels very stop-the-world), it seems like it could be a huge mess.Desiderata summary:
Nat
and Int
We change the spec of Nat
and Int
as follows. They are "at least 64 bits". If an operation produces a number which does not fit in 64 bits, the result is unspecified (implementation dependent). Correct code that works with all implementations will use an explicit mod
or masking as needed.
Idea 2a: We later change the spec of Nat
and Int
to be arbitrary precision, once the JIT is a full replacement for the Haskell runtime. We can introduce other width number types at that point too (like Nat32
, Nat60
, etc) which are guaranteed to be unboxed and have implicit mod / masking, though there's less need for them. It's likely that they're useful for some bit twiddling applications.
Desiderata summary:
Nat
and Int
as before), but is worse for everything else.Rollout plan:
There are variants of this idea. As a straw man, if we think the implicit 64-bit behavior only shows up when people use multiplication and Nat.shiftLeft
, we could have these and only these operations continue to return 64-bit truncated results (by going through boxed implementations). Possibly kinda weird.
Int
and Nat
via a separate checkThis is an interesting and possibly performant way to get full 64-bit Int
and Nat
on the JIT, without a huge hit to performance, hence much less need for everyone to switch to using Natural
and Integer
. Here's the idea (pseudocode). I'll use multiplication as an example, but it works for other operations:
maxNat61 = 2^61 - 1
n1 * n2 =
r = Natural.multiply n1 n2
if r <= maxNat61 then r
else Natural.mod r (2^64)
So in the fast path, it does the multiplication unboxed and is less than maxNat61
, and returns immediately. No boxing or unboxing.
In the event that it doesn't fit in 61 bits, it calls the arbitrary precison mod
operation on the result. This results in boxing, but won't affect "most" code.
Performance of this seems good from initial microbenchmark, about the same as unboxed arithmetic. We suspect that CPU branch prediction is working for us here. The fast path is almost always taken so it's very cheap on a modern CPU. That said, Dan is doing some further testing for actual generated Unison code, which isn't quite as clean as what he wrote by hand and may stymie the optimizer in some way that tanks the performance of this approach.
Assuming this works out though, this seems like our winner. We get the same semantics as the current Nat
and Int
types, fast performance, and hit all of our desiderata. We can still later introduce Natural
and Integer
at our leisure, but since performance is good with Nat
and Int
, people aren't compelled to migrate their code. Instead, Natural
and Integer
will be somewhat special purpose, for situations where you really do need arbitrary precision.
How are we feeling about this ticket? @dolio @pchiusano
Update on this, we did a more detailed design review and came up with a better solution, Idea 3 below. Benchmarks confirm this is just as fast as using the native
Nat
andInt
functions, so we're going with that. TL;DR: the JITNat
andInt
types will behave just like in the interpreter and be fast.Original discussion recorded here for posterity.
In the Haskell interpreter,
Nat
andInt
are implemented as 64 bit machine integers. However, Scheme (at least Chez and Racket) fixnums are tagged, and have fewer than 64 bits even in a 64 bit implementation. There are also no signed vs. unsigned fixnums, so at the moment, if we used them as the backing forNat
andInt
we would have 61 bit integers and 60 bit naturals. We're guessing that this is related to using word aligned memory, which leaves 3 bits always 0 at the end of a 64 bit pointer.Scheme/Racket has arithmetic operations that don't force you to use fixnums, though. For instance, addition will automatically convert to an arbitrary precision number if the result is too large to be stored in a fixnum. The downside is that operations on the arbitrary precision values are considerably slower (comparable to GHC's
Integer
). So, it is possible to cover the full 64 bit space this way, but values that actually need 64 bits to represent are slow.After some testing, it seems like this is still the best way forward, though. Surprisingly (to me), it doesn't seem like using the general operations actually has a significant cost. A Racket loop that adds up many numbers doesn't seem to run slower using
+
than it does usingunsafe-fx+
or similar. Presumably there's some optimization that allows the extra cost to only be incurred if an overflow happens, and otherwise the performance on small values is identical.With that in mind, we decided on the following strategy for resolving this issue:
Int
andNat
are technically arbitrary precision. But, you will only get full performance on ~61 bit values. Nevertheless, one needn't worry that code assuming 64 bits are available will failNat
andInt
can represent at least 64 bits of information. This means that the Haskell interpreter doesn't have to be rewritten to use arbitrary precision operations, which would be a considerable amount of work. When the interpreter is eventually retired, we can revisit the decision about what precision is required to be available.Nat32
andInt16
. These fit in a fixnum, and can be used for people who don't want to accidentally flip into a much slower mode of operation. It doesn't seem likeNat/Int64
will be possible to implement efficiently in Scheme, but hopefully the smaller sizes will be good enough for the relevant use cases (like bitwise operations and such).Here are some things we considered, but they seem less desirable than the above plan:
Nat
andInt
only guarantee 60/61 bits, so that fixnum-only operations can be usedNat
andInt
are exactly 60/61 bits.Nat
andInt
are arbitrary precision immediately(bitwise-and (...) #xffffffffffffffff)
forces the code into the bignum regime, because the mask requires a bignum to represent. This means you never get the good performance for 61 bit and below values.CC @pchiusano @jaredly