tc39 / proposal-bigint

Arbitrary precision integers in JavaScript
https://tc39.github.io/proposal-bigint
561 stars 57 forks source link

Performance of 64-bit integers implemented on top of BigInt (benchmarks + bad news) #117

Closed sjrd closed 5 years ago

sjrd commented 6 years ago

Hello,

I have some bad news regarding the performance of BigInts when used to implement 64-bit integers. tl;dr An implementation of Longs on top of BigInts (with asIntN) in Scala.js is 60x slower than the existing one based on 32-bit integers.

Since the Readme specifically says that

however, my understanding is that, if the appropriate casting operator (e.g., BigInt.asUintN) is used everywhere, an implementation like V8 is expected provide the same performance for BigInt as it would for an Int64 type.

and since we now have a preliminary implementation in v8, I thought I'd benchmark this. So today I hacked up the Scala.js emitter to emit Longs (64-bit integers) as a BigInt-based implementation. The compiler changes can be seen in this commit if you're really interested, but it's mostly irrelevant. The entire test suite of Scala.js passes with these changes.

Concretely, the implementation of the various operations is pretty obvious. For example, a + b translates to BigInt.asIntN(64, a + b). Shifts are inconvenient due to #40, because in Java/Scala, in an operation such as a << b, even when a is a Long, b is a 32-bit Int. So in general, a << b must be translated as BitInt.asIntN(64, a << BigInt(b & 63)) (although most of the time b is a constant, so BigInt(b & 63) can be precomputed and folds into a literal).

I have then used the original compiler, and the modified compiler, to compile a SHA-512 benchmark I wrote a while ago (source). And then I used d8 from a couple days ago (revision https://github.com/v8/v8/commit/5113f10d4d5acda6fe23b38bf77cfd1278a7fa85) to run the benchmarks.

The results are really, really bad for the BigInt-based implementation:

Implementation Mean (µs/op) SEM (µs/op)
BigInt-based 5,630,596.85 27,217.38
User-space 89,810.44 613.73

The new, supposedly "native" implementation of 64-bit integers based on BigInts is a whopping 60 times slower than the user-space implementation of Scala.js which works on ECMAScript 5.1.

Sure, Scala.js has faster JavaScript 64-bit integers than other current implementations like those of Kotlin or TeaVM, but still, those are not that slow (both about 8x slower than Scala.js on Chrome v57.0).

For reproduction, you can find the generated .js files in this gist, along with the (very simple) command line invocation to run them. These files could also be used by VM implementers as benchmarks to drive optimizations.

I know that v8's implementation is only a baseline for now, and that any optimizations--let alone those targeted at the 64-bit use case--are yet to come. But currently the results are quite damning. The performance would need to improve 10-fold to even be competitive with rather naive user-space implementations, and 60x to compete with the Scala.js implementation. Before that happens, these results cast doubts on the claim in the readme that BigInts are a viable way to implement 64-bit integers in JavaScript.

I guess I don't really have a straightforward suggestion as to what we should do to "fix" this. I still hope the results and/or the benchmarking files can help.

I am also open to suggestions on a better encoding of the operations (e.g., would snapshotting BigInt.asIntN in a local var of the giant IIFE help?). I can provide new results pretty quickly, as only a couple lines need to be changed in the compiler.

jakobkummerow commented 6 years ago

Thanks for the report; this is useful.

The results are not unexpected; they are a direct consequence of the current baseline implementation, for which having lots of BigInt operations with short-lived intermediate results is the worst case. The key point is that currently, every BigInt is allocated as an object. A secondary factor is that currently, all BigInt operations require a call from generated code to C++ code. For now, BigInt.asIntN(64, ...) does not enable any optimizations, it is simply another operation that must be performed and has a cost. So for example, a snippet like:

BigInt.asIntN(64, BigInt(data.u[i$1]) << 61n)

currently causes three BigInt objects to be allocated, each as part of a call out of generated code. The Number based alternative presumably inlines everything and gets by without any object allocations, just keeping raw 32-bit values in registers. A 60x performance difference is certainly plausible.

We can optimize for this, and it is not unlikely that we will. The current plan is to collect feedback (like this issue!) about what use cases are most important to people, so that we can make sure we spend our time on things that matter.

FWIW, I would guess that most hashing algorithms are sufficiently memory constrained that an int64 based version backed by a fully optimized BigInt implementation will be just about competitive with the int32 based version, but not faster -- that's on 64-bit machines; on 32-bit devices, the latter may well always be faster.

would snapshotting BigInt.asIntN in a local var of the giant IIFE help?

For now, caching BigInt.asIntN in a local var should have a tiny, tiny (probably not even measurable) benefit; however I would advise against this as it probably makes any future optimization harder. For now, skipping the asIntN calls entirely should have a measurable benefit; but again I would advise against that, because any future optimization will probably use that hint.

Side note: if the proposal had been Int64/Uint64 instead of BigInt, we would most likely be in the same situation now: optimizable, but not actually optimized in the initially shipped baseline implementation.

sjrd commented 6 years ago

Thanks for your reply.

FWIW, I would guess that most hashing algorithms are sufficiently memory constrained that an int64 based version backed by a fully optimized BigInt implementation will be just about competitive with the int32 based version, but not faster -- that's on 64-bit machines; on 32-bit devices, the latter may well always be faster.

I am pretty sure our SHA-512 benchmark is not memory-constrained, because we have a clone of it, SHA-512 Int, which performs exactly the same operations but all Longs have been brutally searched-and-replaced by Ints. It does not compute a correct SHA-512 hash, of course, but this allows us to measure the overhead of 64-bit operations versus 32-bit operations. Both read the same amount of data from "memory" (as in: from the input buffer and output buffer).

On v8, SHA-512 Int is 3x faster than SHA-512, which, I think, demonstrates that SHA-512 is not memory-constrained (SHA-512 Int might be).

Therefore, I expect that an Int64 implem backed by a fully optimized BigInt implementation should compete with the performance of SHA-512 Int, and therefore be faster than the existing implementation of Longs in Scala.js.

For now, skipping the asIntN calls entirely should have a measurable benefit; but again I would advise against that, because any future optimization will probably use that hint.

Skipping the asIntN calls would be incorrect, so I am definitely not going to do that ;)