Open bon opened 8 years ago
it seems normal-plus is running w/o boxing, right?
Correct! Fixed in https://github.com/bon/inlined-generic-function/commit/76d1eb6e77ebc5433465b9afb2cdb84b6c4c3e4d
Processor cycles are now
588,650
586,253
1,889,394
550,351
phew.
I just tested your version. On my machine, the result is still in favor of the inlined version.
Evaluation took:
0.001 seconds of real time
0.004000 seconds of total run time (0.004000 user, 0.000000 system)
400.00% CPU
638,640 processor cycles
131,024 bytes consed
Evaluation took:
0.000 seconds of real time
0.000000 seconds of total run time (0.000000 user, 0.000000 system)
100.00% CPU
608,634 processor cycles
163,808 bytes consed
Evaluation took:
0.003 seconds of real time
0.000000 seconds of total run time (0.000000 user, 0.000000 system)
0.00% CPU
4,543,020 processor cycles
655,184 bytes consed
Evaluation took:
0.000 seconds of real time
0.000000 seconds of total run time (0.000000 user, 0.000000 system)
100.00% CPU
389,169 processor cycles
163,808 bytes consed
What is this difference? In your result I-g-function is performing better, but not much better. I use SBCL 1.3.8 on roswell on
$ uname -a
Linux guicho-x61 4.4.0-36-generic #55-Ubuntu SMP Thu Aug 11 18:01:55 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
$ cat /proc/cpuinfo
...
model name : Intel(R) Core(TM)2 Duo CPU T7100 @ 1.80GHz
...
For me the numbers of cycles vary wildly from run to run. Sometimes the igf gets a little quicker, sometimes slower. One example is shown below.
But the more interesting question is why the igf showed a 10x speedup on numbers but hardly any difference on defined classes? Of course I would be very happy to see a 10x speedup on defined classes too!
$ cat /proc/cpuinfo | ag 'model name' | head -1
model name : Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz
$ uname -a
Linux tie 4.7.2-1-ARCH #1 SMP PREEMPT Sat Aug 20 23:02:56 CEST 2016 x86_64 GNU/Linux
$ ros use sbcl
$ ~/.roswell/impls/x86-64/linux/sbcl/1.3.9/bin/sbcl --version
SBCL 1.3.9
$ ros run
$ rlwrap ros run
* (ql:quickload :inlined-generic-function)
...
* (load "benchmark.lisp")
...
Evaluation took:
0.000 seconds of real time
0.000000 seconds of total run time (0.000000 user, 0.000000 system)
100.00% CPU
424,334 processor cycles
131,024 bytes consed
Evaluation took:
0.000 seconds of real time
0.000000 seconds of total run time (0.000000 user, 0.000000 system)
100.00% CPU
362,358 processor cycles
163,792 bytes consed
Evaluation took:
0.001 seconds of real time
0.000000 seconds of total run time (0.000000 user, 0.000000 system)
0.00% CPU
2,060,160 processor cycles
655,200 bytes consed
Evaluation took:
0.000 seconds of real time
0.003333 seconds of total run time (0.003333 user, 0.000000 system)
100.00% CPU
493,287 processor cycles
163,792 bytes consed
the reason of not achieving 10x speedup is due to the type information and the cost of slot access.
contents
slot of box
is not typed, so the (+ (contents a) b)
part is always calling a generic-+
, not the optimized machine assembly. You should check the disassembly result.contents
is a normal generic function. So the slot access is slow.Imagine the total cost is 10X for normal GF and X for IGF. Above two factor adds two overheads, resulting in 10X+A+B vs X+A+B. Then obviously 10 times speedup is not achievable since A+B could be very large.
I updated the environment and noticed that the examples in playground.lisp
getting slow. It looks like the function is prevented from inlining.
(push :inline-generic-function *features*)
still successfully forces the functions being inlined, but I don't like this solution...
The benchmarks provided are for methods on the built-in lisp types number, fixnum and double-float. To test the behaviour on defined classes we added a simple boxing class and found that peformance degraded when using inlined-generic-functions, inlined. We found the following numbers of processor cycles for the four methods in
playground.lisp
, respectively:Experiment on sbcl 1.3.5.24
See https://github.com/bon/inlined-generic-function/commit/8b6e4d5b10cace47de4343e6dde8455f21dfd579
So my question is whether this indicates that inlined-generic-functions only speed up on built-in types and not on defined classes?