Open sgraf812 opened 2 years ago
Hi, I've refactored the primop handling to use hashmap lookup. Try out the interpreter from this branch: https://github.com/grin-compiler/ghc-whole-program-compiler-project/tree/primop-map/external-stg-interpreter
Wow, that was fast!
And it also resulted in a great speedup, now my reduced benchmark case takes 35s instead of 55s. Great work!
Another perhaps superior suggestion might be to intern all primop names and use an IntMap/Array, but the profile suggests that *.evalPrimOp
is no longer a bottle-neck, so why bother.
I think I closed prematurely, after all the changes aren't on master yet. But I think https://github.com/grin-compiler/ghc-whole-program-compiler-project/commit/f5e6806a60995e491b8e333d33ec0fa07b1c4867 fixes the issue.
While the implementation of, e.g.,
...ByteArray.evalPrimOp
is rather direct and elegant at the momentIt yields rather slow code. That is because GHC doesn't do the "obvious" trie-based match optimisation, which means we end up with a linear chain of comparisons
in Core.
I took a profile (on
nofib
"sbernoulli
, if that matters) and...ByteArray.evalPrimOp
takes about 10% of time and allocation. Here is an excerpt of the profileI think a bit of focus on optimising
evalPrimOp
may well speed up the interpreter by 50%. One way to do so would perhaps be to use aHashMap
or Trie to do the lookup.Why optimise anyway? Because at the moment a single run of NoFib's
bernoulli
benchmark takes about half an hour when the compiled program takes just 0.1s. That's quite a deal breaker for an exhaustive benchmark run of all 11* benchmarks.