Closed dolio closed 3 weeks ago
Results of @pchiusano/misc-benchmarks
Without optimizations:
fib1
1.620115ms
fib2
5.263945ms
fib3
6.05794ms
Decode Nat
1.107µs
Generate 100 random numbers
750.962µs
List.foldLeft
5.181553ms
Count to 1 million
586.405189ms
Json parsing (per document)
657.471µs
Count to N (per element)
769ns
Count to 1000
768.651µs
Mutate a Ref 1000 times
1.3586ms
CAS an IO.ref 1000 times
1.799077ms
List.range (per element)
1.047µs
List.range 0 1000
1.110156ms
Set.fromList (range 0 1000)
5.253066ms
Map.fromList (range 0 1000)
3.9308ms
NatMap.fromList (range 0 1000)
18.409758ms
Map.lookup (1k element map)
11.931µs
Map.insert (1k element map)
23.333µs
List.at (1k element list)
1.25µs
Text.split /
72.824µs
With optimizations:
fib1
1.4078ms
fib2
4.922104ms
fib3
5.670696ms
Decode Nat
932ns
Generate 100 random numbers
647.73µs
List.foldLeft
4.492801ms
Count to 1 million
514.030489ms
Json parsing (per document)
611.919µs
Count to N (per element)
651ns
Count to 1000
668.335µs
Mutate a Ref 1000 times
1.114969ms
CAS an IO.ref 1000 times
1.491692ms
List.range (per element)
887ns
List.range 0 1000
956.614µs
Set.fromList (range 0 1000)
4.234707ms
Map.fromList (range 0 1000)
3.200286ms
NatMap.fromList (range 0 1000)
16.115533ms
Map.lookup (1k element map)
7.848µs
Map.insert (1k element map)
20.428µs
List.at (1k element list)
822ns
Text.split /
69.549µs
The inlining part doesn't seem to make much difference right now (at least on those benchmarks), but it should start being more significant with some of the stuff Chris is working on.
Here's a contrived example where it makes a difference:
indirect.myadd : Nat -> Nat -> Nat
indirect.myadd m n = m + n
indirect.myadd2 m n = myadd m n
indirect.myadd3 : Nat -> Nat -> Nat
indirect.myadd3 m = myadd2 m
indirect.loop acc = cases
0 -> acc
n -> indirect.loop (myadd3 1 acc) (n-1)
indirect.main = do timed do indirect.loop 0 10000000
without inlining
took: 6.36881337s
with inlining:
took: 4.942787153s
You can add more indirections and it doesn't make a difference with inlining, except I guess if you added more than 30 indirections, which would trigger my "we're in an infinite inlining loop" check. I could add more complications to handle cases like that, but it doesn't seem worth it right now.
This PR implements two optimizations for the interpreter.
MCode
, we look up arities for functions and emit more efficientCall
instructions. This allows things like self-recursive loops to be more efficient.MCode
, it runs an inlining pass on the A-normal code. This checks for really simple wrapper-like code like (conceptually)x y -> f x y y
, or stuff likex -> 3
. If such things are fully or over-saturated, it will try to inline directly into the call site. Eventually, I think most builtins will be of this form, so saturated calls to builtins will be inlined to directly use the interpreter instructions.MCode
, so it only affects what is actually run on the interpreter. We can't really do it earlier, because the inlined code isn't proper to send around to other machines, and the runtime hashes would change.Also included are some tweaks to the pretty printing for debugging
MCode
generation. I think I'm going to try one more tweak to that before merging this.