SRFI 27 performance - Githubissues

larceny-trac-import commented 11 years ago

_Reported by: t-guest on Wed Mar 4 18:54:22 2009 _ I've noticed that Larceny's implementation of SRFI 27 is about 20x-30x slower than PLT Scheme's. Even though PLT appears to be using an FFI'ed C primitive, and Larceny is probably dealing with boxed flonums, this seems less than ideal, and I figured it couldn't hurt to report it.

My small test program, as loaded in R5 mode, is as follows:

(require 'srfi-0)
(require 'srfi-27)

(define (test n)
  (let loop ((n n) (a 0))
    (if (zero? n) a
    (loop (- n 1) (modulo (+ a (random-integer 5)) 16384)))))

My !MzScheme test replaces the preamble with this:

#lang scheme
(require srfi/27)
(provide test)

and is loaded with a require form. On the machine I'm typing on, (test 1000000) takes 0.34s on x86_64 !MzScheme v4.1.4, 0.48s on x86_32 !MzScheme v4.1.4, and 9.2s on Intel Larceny v0.961.

larceny-trac-import commented 11 years ago

Author: pnkfelix Just to save Will the effort, here's the run-with-profiling output for that test above.

artichoke:~ pnkfelix$ /home/pnkfelix/bin/larceny -- srfi-27-test.sch 
Larceny v0.97a3 (alpha test) (Feb 25 2009 19:48:22, precise:Linux:unified)
larceny.heap, built on Wed Feb 25 19:53:28 EST 2009

> (require 'profile)
#t

> (run-with-profiling (lambda () (test 1000000)))
 %  topmost named procedure
29  expt
19  mrg32k3a-random-integer
12  mrg32k3a-random-m1
10  big-subtract-digits
5  %flonum->integer
5  bignum-subtract!
5  %bignum-length
5  test
5  big-compare-magnitude
5  %flonum->bignum
2  fixnum->bignum

 %  active procedures
100  r5rs-entry-point
100  repl
100  run-with-profiling
100  test
95  mrg32k3a-random-integer
64  %flonum->integer
31  %flonum->bignum
29  expt
24  bignum-subtract
24  big-subtract-digits
12  mrg32k3a-random-m1
7  big-compare-magnitude
5  bignum-subtract!
5  %bignum-length
2  fixnum->bignum

16222

>

From that I would guess that our slow bignums are hurting us more in this case than our slow flonums.

larceny-trac-import commented 11 years ago

Author: will I'm glad this was logged, because that benchmark shouldn't have been spending ''any'' time on bignum arithmetic.

Fixing that in changeset:6116, and making minor improvements to the reference implementation of SRFI 27, reduced the timing from 9 seconds to 7 seconds.

No matter what the profiler says, the benchmark now spends about 85% of its time in inexact->exact. I don't know how that breaks down between the computation done in Scheme and the context switches from Scheme to C to Scheme to C to Scheme, but we know the context switches are part of the reason our bignums are so slow.

The flonum part of the computation is under control, because changing the benchmark to call random-real instead of random-integer shows that Larceny's implementation in Scheme is only 3 times as slow as MzScheme's implementation in C.

larceny-trac-import commented 11 years ago

Author: will As a temporary workaround, changeset:6119 replaces the time-consuming call to inexact->exact with a call to a specialized version that doesn't have the Scheme-to-C-to-Scheme-and-back-again overhead.

The long-term solution is to fix inexact->exact and all other arithmetic primitives that are going through the current exception and contagion system.

Jed's benchmark is now about 9 times as fast as when he ran it, and about 25 times as fast as in v0.963. On Will's machine, MzScheme is still about 3 times as fast as Larceny on that benchmark, but Will thinks that's just the difference between implementing the random number in Scheme and implementing it in C.

larceny-trac-import commented 11 years ago

Author: will Oops, the workaround was implemented by changeset:6120, not changeset:6119.

larceny-trac-import commented 11 years ago

Author: will For IAssassin, this was fixed by changeset:6129 and changeset:6130. The problem persists in the Nasm version, Petit Larceny, and Common Larceny. It may be hard to fix this in Petit Larceny and Common Larceny, but we should be able to fix the Nasm version.

As of changeset:6130, SRFI 27 actually performs slightly better without the temporary workaround, so we should remove that workaround before the next release. To remind us to do that, Will is changing the milestone for this ticket to Larceny 0.97.

larceny-trac-import commented 11 years ago

Author: will The temporary workaround was removed by changeset:6152.

Petit Larceny and Common Larceny may still have performance problems, but they can wait.

pnkfelix / larceny-test

SRFI 27 performance #615