Open larceny-trac-import opened 11 years ago
Author: pnkfelix
Just to save Will the effort, here's the run-with-profiling
output for that test above.
artichoke:~ pnkfelix$ /home/pnkfelix/bin/larceny -- srfi-27-test.sch
Larceny v0.97a3 (alpha test) (Feb 25 2009 19:48:22, precise:Linux:unified)
larceny.heap, built on Wed Feb 25 19:53:28 EST 2009
> (require 'profile)
#t
> (run-with-profiling (lambda () (test 1000000)))
% topmost named procedure
29 expt
19 mrg32k3a-random-integer
12 mrg32k3a-random-m1
10 big-subtract-digits
5 %flonum->integer
5 bignum-subtract!
5 %bignum-length
5 test
5 big-compare-magnitude
5 %flonum->bignum
2 fixnum->bignum
% active procedures
100 r5rs-entry-point
100 repl
100 run-with-profiling
100 test
95 mrg32k3a-random-integer
64 %flonum->integer
31 %flonum->bignum
29 expt
24 bignum-subtract
24 big-subtract-digits
12 mrg32k3a-random-m1
7 big-compare-magnitude
5 bignum-subtract!
5 %bignum-length
2 fixnum->bignum
16222
>
From that I would guess that our slow bignums are hurting us more in this case than our slow flonums.
Author: will I'm glad this was logged, because that benchmark shouldn't have been spending ''any'' time on bignum arithmetic.
Fixing that in changeset:6116, and making minor improvements to the reference implementation of SRFI 27, reduced the timing from 9 seconds to 7 seconds.
No matter what the profiler says, the benchmark now spends about 85% of its
time in inexact->exact
. I don't know how that breaks down between the
computation done in Scheme and the context switches from Scheme to C to Scheme
to C to Scheme, but we know the context switches are part of the reason our
bignums are so slow.
The flonum part of the computation is under control, because changing the
benchmark to call random-real
instead of random-integer
shows
that Larceny's implementation in Scheme is only 3 times as slow as MzScheme's
implementation in C.
Author: will
As a temporary workaround, changeset:6119 replaces the time-consuming call
to inexact->exact
with a call to a specialized version that doesn't
have the Scheme-to-C-to-Scheme-and-back-again overhead.
The long-term solution is to fix inexact->exact
and all other arithmetic
primitives that are going through the current exception and contagion system.
Jed's benchmark is now about 9 times as fast as when he ran it, and about 25 times as fast as in v0.963. On Will's machine, MzScheme is still about 3 times as fast as Larceny on that benchmark, but Will thinks that's just the difference between implementing the random number in Scheme and implementing it in C.
Author: will Oops, the workaround was implemented by changeset:6120, not changeset:6119.
Author: will For IAssassin, this was fixed by changeset:6129 and changeset:6130. The problem persists in the Nasm version, Petit Larceny, and Common Larceny. It may be hard to fix this in Petit Larceny and Common Larceny, but we should be able to fix the Nasm version.
As of changeset:6130, SRFI 27 actually performs slightly better without the temporary workaround, so we should remove that workaround before the next release. To remind us to do that, Will is changing the milestone for this ticket to Larceny 0.97.
Author: will The temporary workaround was removed by changeset:6152.
Petit Larceny and Common Larceny may still have performance problems, but they can wait.
_Reported by: t-guest on Wed Mar 4 18:54:22 2009 _ I've noticed that Larceny's implementation of SRFI 27 is about 20x-30x slower than PLT Scheme's. Even though PLT appears to be using an FFI'ed C primitive, and Larceny is probably dealing with boxed flonums, this seems less than ideal, and I figured it couldn't hurt to report it.
My small test program, as
load
ed in R5 mode, is as follows:My !MzScheme test replaces the preamble with this:
and is loaded with a
require
form. On the machine I'm typing on,(test 1000000)
takes 0.34s on x86_64 !MzScheme v4.1.4, 0.48s on x86_32 !MzScheme v4.1.4, and 9.2s on Intel Larceny v0.961.