Open paf31 opened 10 years ago
I do recommend profiling before optimization. Javascript engines do remarkable transformations that make it even more difficult than usual to identify trouble spots without profiling.
Also to the extent that we can parallelize compilation, we can take advantage of web workers on the front-end (not sure if anything similar is available on the backend). With most computers having 4+ cores these days, you can get an easy 2 - 10x increase in performance just from parallelizing.
You can spawn additional processes in node and communicate with JSON over IPC, but I suspect it's a bit clunkier than web workers.
Well psc-make
could be parallelized at a fairly high level, and the module graph gives a natural scheme for data flow parallelism. Definitely worth looking into later I think.
I'm more concerned by the several orders of magnitude difference in single threaded performance right now though :P
Also, the Map
implementation used in the typechecker has O(N) lookup due to the order of the insertions. Either we should make Map
automatically-balancing (preferred), or we should try a different data structure.
I meant to say, I tried profiling psc-in-ps
the other day and got a slightly strange results. As far as I can tell from the output, eqArray
and bindStateT
seem to be the most expensive operations with a grand total of ~5% of the total JS execution time each. It makes no sense.
This was using nprof
. I couldn't get dtrace
or look
or to work.
bindStateT
sounds plausible, because both desugaring and type checking use StateT
. We can probably use an IORef
and exceptions in place of ErrorT
in both cases, which should speed things up significantly.
Edit: Do you have the profiler results?
Oh sure, it makes sense for what is expensive, I just meant it's odd that almost 80% of the time is missing in the profiler result, and that the supposedly two most expensive functions are only responsible for 10% of the total time.
I'll make a gist with the results when I get a minute.
https://gist.github.com/garyb/4888f29a11c09e469a40
Different results on my home machine, the total time % value is even less for bindStateT
/ eqArray
!
Interesting. There are a few modules which show up quite often: Data.Map, the typechecker, Names, and PatternArrows. Maybe we can try optimizing those and see if that helps.
I'm going to see if I can get dtrace
working on a SmartOS VM, might get some more sense out of that too.
I hacked some timing into the compiler, here's what I get for just running psc
:
read & parse files: 10ms
sortModules: 18ms
desugar: 531ms
typecheck: 6212ms
binding groups: 96ms
js codegen: 224ms
compile: 8519ms
So the type checker is by far the main culprit. Parser is amazingly fast considering.
I've put it in a branch c1c7f8fda4a430d003cd4409b8f1423451b8fb88
Great to know :) I'll have a look at the typechecker in the next couple of days. I have a few ideas for how it could be improved.
New branch typecheck time is now ~3926ms down from ~5050ms. Pretty good, but considering the amount of work you did on it shame it's not a bit more!
I have more up my sleeve :)
Good stuff :)
I have a couple of other things I'd like to try:
instanceof
checks.but to be honest, I'm starting to lose faith that the JS version can be fast enough to be practical.
If we can't get this to within a factor of 10 of the performance of the Haskell version, I think we should seriously consider sticking with what we have, and just work on packaging it better for end users (binary distributions, apt packages, brew etc.) We should chat about it on IRC when everyone is online.
Edit: I'd also like to try replacing Data.Map
with a self-balancing implementation to see if that helps.
Unless the optimization is guided by fine-grained profiling, it'll just be blind luck if it improves anything (I know I say this all the time but it's so true it bears repeating... all the time :) ).
If modules can be compiled independently, then user compile time will be fairly small in the common case that only a couple files have changed (obviously this would require more a intelligent build process but that's something needed anyway). The speed of incremental compilation is really all that matters, since developers spend 99% of their time compiling a project that's only changed by a few lines.
The other thing is that Purescript does not have a sophisticated optimizer. The difference between GHC with no optimizations and GHC with all optimizations is astounding. Optimizations will be driven by slow performance and should have a profound impact on not just ps-in-ps but user-land code, too. It could be a really good thing the compiler's slow because it will accelerate optimizations.
I agree about profiling, but I have no idea how to extract any meaningful data from a running instance of psc
, other than to stick a console.time
on every other line.
Regarding incremental compilation, psc-make
will avoid recompiling entire modules if they haven't changed, which does help a lot in the Haskell compiler, but 5s to build the Prelude is just far too slow to be practical. Incremental compilation is only going to get us so far if the single-module compile time is high. Related: finer grained incremental compilation could also be worth looking at - per-binding-group, for example. Also, we can be cleverer about which modules need to be recompiled. Right now we're recompiling too many things when a module low in the dependency DAG changes.
As for optimizations, I'm all in favour if anyone wants to investigate. Hopefully after the hangout today we'll have a whole bunch of newly initiated compiler developers :) My only stipulation is that the optimizations should keep the generated code complexity approximately the same. Aggressive inlining, for example, could be a big help to performance, but not to readability.
I had no luck with getting an OS that dtrace works on to run, but I'm trying something else now which will hopefully let us profile.
One thing that might be worth reviewing is if we have things left in the code that are being evaluated unnecessarily due to strict evaluation.
One example of this is when we use guardWith
from the typechecker monad, so when doing something like this:
guardWith (strMsg $ "Expected type of kind *, was " ++ prettyPrintKind kind) $ kind == Star
we're calling strMsg
and prettyPrintKind
even if the error never happens, I think.
After doing that stuff with isNullaryConstructor
in codegen the other day it occurred to me that I could probably hack newtype
s in during codegen (applying to any data constructor that is defined like a newtype
). I was hoping it would give us a nice boost, as StateT's data constructor and the like are newtypes then.
I think it knocked an entire 20-30ms off. :)
I've changed my mind on newtype
s a little. I think giving them their own keyword would be a good idea, just because it makes the code generation strategy easier to explain. data
types always get a class, newtypes
are just a type wrapper with no cost at runtime.
Yeah, agreed. That way, as with Haskell, you know for sure they are inlined too as using newtype
can enforce the one-argument-one-constructor rule.
My above comment was more just saying that even applying it globally makes a tiny difference to performance, so it doesn't need to be that high on the list of optimizations to add. The main thing it will help with is optimizing algorithms using f-algebras a la recursion-schemes
and the like, as TCO starts getting applied in places it didn't use to.
Looks like there's a way to get dtrace equivalent info out of Node on Linux now: http://www.brendangregg.com/blog/2014-09-17/node-flame-graphs-on-linux.html
Should be helpful when looking at performance again!
We should also try to profile the running Javascript to see where the bottleneck is.