jamesrhester / Lerche.jl

A Julia port of the Lark parser
MIT License
45 stars 3 forks source link

Lark() is very slow #17

Open ddyok opened 3 years ago

ddyok commented 3 years ago

I'm attempting to run the calc.jl example, and for some reason running Lark(...) takes a very long time. Not sure that this is on my end or how to know that. In any case, I've added a timer:

$ julia calc.jl
 ──────────────────────────────────────────────────────────────────────────────────────────────
                                                       Time                   Allocations
                                               ──────────────────────   ───────────────────────
               Tot / % measured:                    30.6s / 95.0%           2.37GiB / 96.9%

 Section                               ncalls     time   %tot     avg     alloc   %tot      avg
 ──────────────────────────────────────────────────────────────────────────────────────────────
 calc_parser = Lark(calc_grammar...)        1    29.0s   100%   29.0s   2.30GiB  100%   2.30GiB
 ──────────────────────────────────────────────────────────────────────────────────────────────

Any assistance much appreciated.

Thank you.

jamesrhester commented 3 years ago

I'll look in to it. What version of Julia are you running and how did you implement the timer? calc.jl should come up with a prompt allowing you to input simple additions/subtractions, how long does it take before this prompt comes up?

ddyok commented 3 years ago

What version of Julia are you running

$ julia --version
julia version 1.6.1

how did you implement the timer?

I've added a:

using TimerOutputs

to = TimerOutput()

...

Then modified the line (in order to debug why the program takes such a long time):

@timeit to "calc_parser = Lark(calc_grammar...)" calc_parser = Lark(calc_grammar, parser="lalr", transformer=CalculateTree())

Then added before main():

println(to)

how long does it take before this prompt comes up?

The prompt doesn't come up until Lark(...) is called, so it takes what's in the @timeit...

ddyok commented 3 years ago

Oops, I've pasted the wrong line from another test program, updated comment.

ddyok commented 3 years ago

Another question is whether the alloc part in the above TimerOutput being 2.30GiB is maybe part of the issue, or whether that is O.K.

jamesrhester commented 3 years ago

So the time and allocations you have measured are almost all first-time compilation time. I get essentially the same result on Julia 1.5. However a second run is much faster. Using BenchmarkTools gives:

@btime Lark(calc_grammar,parser="lalr",transformer=CalculateTree()) 32.141 ms (54294 allocations: 2.81 MiB)

This first-time compilation cost can typically be transferred to package precompilation time if Lerche is used as part of a package, in which case the grammar is calculated once. Unfortunately calc.jl is presented as a standalone script which means each time it is run from the command line julia goes through the whole compilation step. Perhaps a good idea for Lerche package improvement would be to include precompilation of the calc grammar as part of Lerche installation to show how it is done?

Probably should also look at those tips from Tim Holy on reducing compilation time.

ddyok commented 3 years ago

Unfortunately calc.jl is presented as a standalone script which means each time it is run from the command line julia goes through the whole compilation step.

I see.

Perhaps a good idea for Lerche package improvement would be to include precompilation of the calc grammar as part of Lerche installation to show how it is done?

Absolutely. That would be great. In any case, I think that this should be mentioned as part of the documentation.

Probably should also look at those tips from Tim Holy on reducing compilation time.

It's worth reducing compilation time, as running from a standalone script makes development a bit easier, but compiling is not biggie I guess.

Thanks a lot for looking into this.

ddyok commented 3 years ago

running from a standalone script makes development a bit easier

And a lot faster, of course. I've been playing around with a small ad hoc language, so any delay when fixing issues in the grammar and having to wait for compilation takes away from rapid development... so compilation time is worth keeping in mind.

erezsh commented 3 years ago

Just as a side note, if you're looking to play around with the grammar, you can do so in Lark, which is very quick and responsive. Once you have a satisfactory grammar, you should be able to use it in Lerche without any changes. (or very minor ones)

jamesrhester commented 3 years ago

Note that the compilation time is only to compile the Julia code for handling the grammar. If you are in a Julia REPL, after the first 30s wait subsequent changes to your grammar and regeneration via Lark will only take 10s of ms. So one development process would be:

  1. start Julia REPL
  2. using Lerche
  3. edit grammar
  4. my_grammar = read("grammar_file",String)
  5. my_parser = Lark(my_grammar)
  6. parse(my_parser,my_text)
  7. Repeat 3-6 until happy
ddyok commented 3 years ago

one development process would be...

Thanks for the suggestion. I'll do that.

As an unintended product of this, I've actually added more test functions to save time and not run the entire program each time, so maybe there's something good that came out of the slow first compilation!

if you're looking to play around with the grammar, you can do so in Lark,

Unforunately, it's a bit difficult (not impossible...) for me to untangle the grammar from the code when testing sometimes, I really rely on the Julia for the functions as return values to the parser when testing, so it'd be really great to just stay in a Julia environment.