Open larceny-trac-import opened 11 years ago
Author: pnkfelix Will provides the following response (via email):
Interpreted systems are often faster than compiled systems
on benchmarks that measure nothing but the speed of library
procedures. Larceny's bignums are among the slowest of any
system that supports bignums. On artichoke, for example,
here is how long it takes to compute (fact 10000):
Petite: .41 seconds
Scheme48: .47
Gambit: .53 seconds
MzScheme: .98 seconds
Kawa: 1.0
Larceny: 18.81 seconds
SCM: blows up
Bigloo: returns 0
Chicken: returns +inf.0
If pi uses even larger bignums than (fact 10000), then it
doesn't surprise me a bit that Larceny is 200 times as slow
as some of those other systems.
Will
So I guess this was a known issue in Larceny...
Author: pnkfelix This is not a priority for us, at least at the moment.
But hey, maybe we could get an mathematically inclined undergrad to spend some time implementing: [http://numbers.computation.free.fr/Constants/Algorithms/fft.html FFT based multiplication]
Author: will I'm reopening this ticket, because this really is a performance bug we should fix.
The (fact 10000)
benchmark shows we're 20-40 times as slow as we should be when multiplying a bignum by a fixnum.
The pi
benchmark paints a more complex picture. With the usual inputs, the pi
benchmark performs the following multiplication and division operations, which take the indicated number of milliseconds (excluding gc time) on a 1.5 GHz SPARC in Chez Scheme 6.1 and in Larceny v0.95:
Chez Larceny ratio
16048 multiplications of a fixnum by a fixnum 10 2400
1133612 multiplications of a bignum by a fixnum 4710 92000 20
2 multiplications of bignums with one >= 2^32 0 0
8 multiplications of bignums with one >= 2^64 0 5
468 multiplications of bignums with one >= 2^128 0 301
6200 multiplications of bignums with one >= 2^256 750 176000 235
6662 divisions 730 172000 235
Multiplication of a bignum by a fixnum is slow because Larceny converts the fixnum to a bignum and then performs a bignum multiplication. Multiplication of large bignums is slow (in part) because Larceny uses classical algorithms, which take time proportional to the product of the lengths of the two bignums. Division of bignums is slow because multiplication is slow and because Larceny uses the classical algorithm.
A quick-and-dirty implementation of Karatsuba's algorithm reduced the multiplication time by about 50 seconds.
All of these things can be improved during the arithmetic conversion.
Author: will Larceny's bignums had been limited to less than 224 bits. This limit was increased to 232 bits by changeset:5759, changeset:5760, and changeset:5761.
The changeover to R6RS arithmetic semantics is nearly complete, so it is time to look into WhyBignumsAreSlow.
Changeset fb7a363c5e21860887d5544d51ac900bc45e8b61 made fixnum multiplication about six times as fast as before, yielding a factor of two improvement on the pi
benchmark and a factor of about 2.5 on the (fact 10000)
benchmark.
That's from changing the millicode. I'm about to see what happens when the faster code is generated inline.
Changeset c16e8985029b0442e69a26fe687a26eae1f204ce undid the millicode changes while adding inline multiplication of fixnums to the IAssassin code generator.
On the machine I've been using for benchmarks, here is how long it takes to compute (fact 10000)
:
System | Time |
---|---|
Vicare | .01 seconds |
Gauche | .06 seconds |
Chicken (R7RS) | .12 seconds |
Racket | .45 seconds |
Chibi | .46 seconds |
Petite Chez | .65 seconds |
Larceny (now) | .75 seconds |
Larceny v0.98 | 2.6 seconds |
Recent changes through 281e64e93d10baeb0a998d76cef2a316a9abfc8b have improved IAssassin bignum performance by enough to make the pi
benchmark run three times as fast as in v0.98.
Larceny's bignum arithmetic remains slow. On the usual benchmark machine, here's how long it takes to compute (fact 100000)
:
System | Std | Time in Seconds |
---|---|---|
Vicare | R6RS | 1.1 |
Racket | R6RS | 5.7 |
Petite Chez | R6RS | 7.6 |
Gauche | R7RS | 7.8 |
Chicken | R7RS | 8.1 |
Chibi | R7RS | 9.9 |
Larceny (now) | R7RS | 20.1 |
Larceny v0.98 | R7RS | 335.5 |
Here's how long it takes to compute (fib 100000)
with a linear number of additions:
System | Std | Time in Seconds |
---|---|---|
Vicare | R6RS | .04 |
Racket | R6RS | .2 |
Petite Chez | R6RS | .2 |
Gauche | R7RS | .25 |
Chicken | R7RS | .8 |
Larceny (now) | R7RS | 2.2 |
Chibi | R7RS | 4.3 |
Larceny v0.98 | R7RS | 5.1 |
So there's still room for improvement. The Great Primop Cleanup will help some. More radical steps should be deferred until after that cleanup.
Reported by: pnkfelix on Mon Mar 5 13:41:41 2007 An email from Aziz. (We should check if these results are consistent across platforms, or if its only a problem on IasnLarceny...)