usethesource / rascal-language-servers

An LSP server for Rascal which includes an easy-to-use LSP generator for languages implemented in Rascal, and an interactive terminal REPL.
BSD 2-Clause "Simplified" License
10 stars 7 forks source link

Prevent "bootstrapping" a full parser generator for deployed DSL implementations #267

Closed DavyLandman closed 11 months ago

DavyLandman commented 1 year ago

Is your feature request related to a problem? Please describe. Currently a DSL can take up to 30s to load, due to Rascal's overhead of generating parsers for files with concrete syntax. Users are looking at a piece of text without syntax highlighting.

Describe the solution you'd like I want to extend the registerLanguage call with a classname. If that classname is defined, we use that for parsing files in that language. This would allow us to provide syntax highlighting as quickly as the JVM has loaded the jars. (a developer is responsible for correctly generating a rascal parser class).

Describe alternatives you've considered

Additional context This is a deployment concern. During development these annotations should be ignored.

jurgenvinju commented 1 year ago

There is a need know what contributes significantly to this bottlebeck:

Requirements on a solution:

It is highly likely that future parsers will be stored in different formats than .class files. This is (a) because the parsing technology becomes more dynamic, or (b) the class loading becomes the bottleneck in parsing. The fastest Java parser I know (JikesPG) is so fast because it does not use the class loading mechanism to instantiate a parser.

Grammars are not only used for parsing in a lot of Rascal code. Every grammar reification, whether used for parsing or not, will trigger a full load of the parser generator. It is very hard and also brittle to weed out all the # operators from a complete program "only" for the sake of efficiency.

[Thinking-out-loud-begin]

Looking at this from a programming language perspective: the parser is a compile-time constant that is derived from the grammar, which is also a compile-time constant. In concrete syntax expressions only this constant is used. And if we call the parse function we also use a constant #start[Program].

So in a way, this is a special case of constant-folding. Step 1:

Step 2:

So that amounts to:

  1. let parsers be cached on disk by ParseTree.parsers based on grammar-hashed file names for class files
  2. (option 4) let grammar values on the user side by cached by the users (by reading them from their own jar files), or (option 2) let the interpreter cache constant grammar reifications.
  3. let concrete syntax grammars be cached on disk by the interpreters import-and-module-parser functionality

Step 2 and 3 both lead into step. I'd prefer 2 over 4 but I don't know how to index #start[Program] yet such that there are no accidental collisions.

jurgenvinju commented 1 year ago

By the way, this is not only a deployment concern, since Rascal programs with reified grammars really load slowly, they are also annoying whene making incremental developments in the REPL. Starting a new REPL takes way too much time.

jurgenvinju commented 1 year ago

However, if we find a solution that only works for deployed programs, that would be great.

DavyLandman commented 1 year ago

I think there are many gains to be had in this area of rascal. You're analysis is right. But also it's a big change you are proposing. And not solving the specific issue we run into.

What I want to try and specifically solve is to reduce the time a user is looking at syntax-highlighting free code. VS Code will trigger rascal the first time it's sees a file of a certain language. And since we've moved the syntax highlighting to the semantic highlighter, it will be without syntax highlighting.

So what is happening before we get syntax highlighting?

  1. start extension
  2. registerLanguage to tell rascal-lsp about where to find the module that contains the contributions
  3. rascal-lsp starts an evaluator
  4. import contributions module this module loads a lot of other modules, the one with the grammar, the one with the typepal based typechecker (with lots of concrete syntax)
  5. after importing is done (at which time every module with concrete syntax will have had a new grammar generated for it) we call the function that build the contributions
  6. that function gets a parser for start[..] and returns that. This again generates a parser, now the first time without concrete syntax and rascal in the mix, so a new one again.

So before we get to parse the file, we might have had to calculate the grammar, generate the parser and compiler the parser 4 to 5 times (depending on the amount of modules with concrete syntax).

I want to find a way to cut that down. I would also prefer a way that's less depending on the current implementation of Rascal, but then again, the deployed scenarios are using fixed versions of rascal, so if that interface changes, we have a way to deal with that.

My idea proposal was backwards compatible by making it a kw param.

DavyLandman commented 1 year ago

With regards to storing the grammar in a file, we would indeed be faster, since we only need to load a dedicated evaluator that runs the parser generator and compile the java files. I'll try and run some profiles to see which part we are spending most time in.

jurgenvinju commented 1 year ago

Is suspect that in the given start-up scenario just before syntax highlighting works we are looking mainly at the loading of the parser generator and the reduction of the #start[] non-terminal to a grammar value. But let's see.

Side note:

It could also be an elegant idea to move forward with the current compiler and compiler the parser generator to Java, and then use one (locked) instance between all the LSP instances (or have one for each instance). That way the loading time should be much lower and we can then focus on making the reification much faster as well (which would be implemented by the same generated Java code library).

DavyLandman commented 1 year ago

Profile, of a new registerLanguage just after a clear of that existing language. (just to fill the caches a bit, which isn't even the case for a fresh start).

already for speed purposes we have the following pattern:

module lang::DSL::Syntax

syntax start Module = ...;

// since Rascal caches the internal parser, but they can inspire, we keep around an instance of our parser
// this helps us avoid regenerating parser if a user stopped interaction with their IDE for a while
private start[Module] (value, loc) cachedParser = parser(#start[Module]);

public start[Module] (value, loc) getParser() = cachedParser;

It does mean that during import we do most of our calculation, and that the actual contributions call is not visible in the trace, as it just returns this cachedParser that was already build during initialization of the form.

So my observations (which align with your suspicions):

If we could do something about the parser generator load times and it's performance, that would be nice indeed.

jurgenvinju commented 1 year ago

Excellent info.

DavyLandman commented 1 year ago

I quickly added a timer around the parse in Import::parseModuleAndFragments and ran it with my machine in slow mode (limited to 4 of it's slower cores). Now parsing the parsergenerator module takes 900ms during import.

Parsing: file:///..../src/org/rascalmpl/library/lang/rascal/grammar/ParserGenerator.rsc took: 905ms

fyi: I only measured the raw parsing time:

      var start = System.nanoTime();
      IActionExecutor<ITree> actions = new NoActionExecutor();

      ITree tree = new RascalParser().parse(Parser.START_MODULE, location.getURI(), data, actions, new DefaultNodeFlattener<IConstructor, ITree, ISourceLocation>(), new UPTRNodeFactory(true));

      var stop = System.nanoTime();
      var time = java.util.concurrent.TimeUnit.NANOSECONDS.toMillis(stop - start);
      if (time > 500) {
        System.err.println("Parsing: " + location.getURI().toString() + " took: " + time + "ms");
      }
jurgenvinju commented 1 year ago

That's a large file, and there are many others like List and Set and Map and grammar::definition::Modules which are similarly large.

Raw parsing time is not something that is easily optimized, as we already have optimized to the bone (@arnoldlankamp ); however, there is a UPTRNodeFactory included that maps the SPPF parse graph to the Tree format we are used to. I'm hoping there is something to get out of that.

Separately the mapping to subclasses of AbstractAST could be expensive.

jurgenvinju commented 1 year ago

This is may trace for import lang::rascal::grammar::ParserGenerator; with printTimes set to true in org.rascalmpl.parser.gtd.SGTDBF which is the main parser driver:

rascal>import lang::rascal::grammar::ParserGenerator;
Parsing: 57
Unbinarizing, post-parse filtering, and mapping to UPTR: 16
Parsing: 3
Unbinarizing, post-parse filtering, and mapping to UPTR: 1
Loading module |file:///Users/jurgenv/git/rascal/src/org/rascalmpl/library/lang/rascal/grammar/ParserGenerator.rsc|
Parsing: 444
Unbinarizing, post-parse filtering, and mapping to UPTR: 82
Loading module |file:///Users/jurgenv/git/rascal/src/org/rascalmpl/library/String.rsc|
Parsing: 67
Unbinarizing, post-parse filtering, and mapping to UPTR: 26
Loading module |file:///Users/jurgenv/git/rascal/src/org/rascalmpl/library/List.rsc|
Parsing: 95
Unbinarizing, post-parse filtering, and mapping to UPTR: 36
Loading module |file:///Users/jurgenv/git/rascal/src/org/rascalmpl/library/Map.rsc|
Parsing: 18
Unbinarizing, post-parse filtering, and mapping to UPTR: 10
Loading module |file:///Users/jurgenv/git/rascal/src/org/rascalmpl/library/IO.rsc|
Parsing: 64
Unbinarizing, post-parse filtering, and mapping to UPTR: 26
Loading module |file:///Users/jurgenv/git/rascal/src/org/rascalmpl/library/Exception.rsc|
Parsing: 6
Unbinarizing, post-parse filtering, and mapping to UPTR: 4
Loading module |file:///Users/jurgenv/git/rascal/src/org/rascalmpl/library/lang/rascal/syntax/Rascal.rsc|
Parsing: 64
Unbinarizing, post-parse filtering, and mapping to UPTR: 30
Loading module |file:///Users/jurgenv/git/rascal/src/org/rascalmpl/library/ParseTree.rsc|
Parsing: 49
Unbinarizing, post-parse filtering, and mapping to UPTR: 29
Loading module |file:///Users/jurgenv/git/rascal/src/org/rascalmpl/library/Message.rsc|
Parsing: 2
Unbinarizing, post-parse filtering, and mapping to UPTR: 1
Loading module |file:///Users/jurgenv/git/rascal/src/org/rascalmpl/library/Type.rsc|
Parsing: 204
Unbinarizing, post-parse filtering, and mapping to UPTR: 48
Loading module |file:///Users/jurgenv/git/rascal/src/org/rascalmpl/library/lang/rascal/grammar/definition/Productions.rsc|
Parsing: 24
Unbinarizing, post-parse filtering, and mapping to UPTR: 4
Loading module |file:///Users/jurgenv/git/rascal/src/org/rascalmpl/library/lang/rascal/grammar/definition/Attributes.rsc|
Parsing: 11
Unbinarizing, post-parse filtering, and mapping to UPTR: 3
Loading module |file:///Users/jurgenv/git/rascal/src/org/rascalmpl/library/ValueIO.rsc|
Parsing: 5
Unbinarizing, post-parse filtering, and mapping to UPTR: 2
Loading module |file:///Users/jurgenv/git/rascal/src/org/rascalmpl/library/lang/rascal/grammar/definition/Literals.rsc|
Parsing: 25
Unbinarizing, post-parse filtering, and mapping to UPTR: 4
Loading module |file:///Users/jurgenv/git/rascal/src/org/rascalmpl/library/Grammar.rsc|
Parsing: 13
Unbinarizing, post-parse filtering, and mapping to UPTR: 4
Loading module |file:///Users/jurgenv/git/rascal/src/org/rascalmpl/library/util/Maybe.rsc|
Parsing: 2
Unbinarizing, post-parse filtering, and mapping to UPTR: 0
Loading module |file:///Users/jurgenv/git/rascal/src/org/rascalmpl/library/lang/rascal/grammar/definition/Names.rsc|
Parsing: 11
Unbinarizing, post-parse filtering, and mapping to UPTR: 3
Loading module |file:///Users/jurgenv/git/rascal/src/org/rascalmpl/library/lang/rascal/grammar/definition/Symbols.rsc|
Parsing: 27
Unbinarizing, post-parse filtering, and mapping to UPTR: 5
Loading module |file:///Users/jurgenv/git/rascal/src/org/rascalmpl/library/lang/rascal/grammar/definition/Characters.rsc|
Parsing: 61
Unbinarizing, post-parse filtering, and mapping to UPTR: 8
Loading module |file:///Users/jurgenv/git/rascal/src/org/rascalmpl/library/lang/rascal/grammar/definition/Regular.rsc|
Parsing: 21
Unbinarizing, post-parse filtering, and mapping to UPTR: 3
Loading module |file:///Users/jurgenv/git/rascal/src/org/rascalmpl/library/lang/rascal/grammar/definition/Modules.rsc|
Parsing: 17
Unbinarizing, post-parse filtering, and mapping to UPTR: 4
Loading module |file:///Users/jurgenv/git/rascal/src/org/rascalmpl/library/lang/rascal/grammar/definition/Layout.rsc|
Parsing: 26
Unbinarizing, post-parse filtering, and mapping to UPTR: 15
Loading module |file:///Users/jurgenv/git/rascal/src/org/rascalmpl/library/Set.rsc|
Parsing: 29
Unbinarizing, post-parse filtering, and mapping to UPTR: 12
Loading module |file:///Users/jurgenv/git/rascal/src/org/rascalmpl/library/util/Math.rsc|
Parsing: 27
Unbinarizing, post-parse filtering, and mapping to UPTR: 10
Loading module |file:///Users/jurgenv/git/rascal/src/org/rascalmpl/library/lang/rascal/grammar/ConcreteSyntax.rsc|
Parsing: 16
Unbinarizing, post-parse filtering, and mapping to UPTR: 14
Loading module |file:///Users/jurgenv/git/rascal/src/org/rascalmpl/library/lang/rascal/grammar/definition/Keywords.rsc|
Parsing: 7
Unbinarizing, post-parse filtering, and mapping to UPTR: 1
Loading module |file:///Users/jurgenv/git/rascal/src/org/rascalmpl/library/Node.rsc|
Parsing: 10
Unbinarizing, post-parse filtering, and mapping to UPTR: 3
Loading module |file:///Users/jurgenv/git/rascal/src/org/rascalmpl/library/lang/rascal/grammar/definition/Priorities.rsc|
Parsing: 48
Unbinarizing, post-parse filtering, and mapping to UPTR: 12
Loading module |file:///Users/jurgenv/git/rascal/src/org/rascalmpl/library/lang/rascal/format/Grammar.rsc|
Parsing: 87
Unbinarizing, post-parse filtering, and mapping to UPTR: 12
Loading module |file:///Users/jurgenv/git/rascal/src/org/rascalmpl/library/analysis/graphs/Graph.rsc|
Parsing: 31
Unbinarizing, post-parse filtering, and mapping to UPTR: 7
Loading module |file:///Users/jurgenv/git/rascal/src/org/rascalmpl/library/Relation.rsc|
Parsing: 48
Unbinarizing, post-parse filtering, and mapping to UPTR: 22
Loading module |file:///Users/jurgenv/git/rascal/src/org/rascalmpl/library/lang/rascal/format/Escape.rsc|
Parsing: 23
Unbinarizing, post-parse filtering, and mapping to UPTR: 5
Loading module |file:///Users/jurgenv/git/rascal/src/org/rascalmpl/library/analysis/grammars/Dependency.rsc|
Parsing: 4
Unbinarizing, post-parse filtering, and mapping to UPTR: 1
Loading module |file:///Users/jurgenv/git/rascal/src/org/rascalmpl/library/lang/rascal/grammar/definition/References.rsc|
Parsing: 2
Unbinarizing, post-parse filtering, and mapping to UPTR: 0
Loading module |file:///Users/jurgenv/git/rascal/src/org/rascalmpl/library/lang/rascal/grammar/Lookahead.rsc|
Parsing: 98
Unbinarizing, post-parse filtering, and mapping to UPTR: 13
Loading module |file:///Users/jurgenv/git/rascal/src/org/rascalmpl/library/lang/rascal/grammar/definition/Parameters.rsc|
Parsing: 14
Unbinarizing, post-parse filtering, and mapping to UPTR: 3
Loading module |file:///Users/jurgenv/git/rascal/src/org/rascalmpl/library/util/Monitor.rsc|
Parsing: 6
Unbinarizing, post-parse filtering, and mapping to UPTR: 2
ok

With CPU profiling turned on for Rascal itself, you get a good view of the largest modules:

FRAMES PROFILE: 130 data points, 3143 ticks, tick = 1 milliSecs
                                         Scope   Ticks        %  Source
        lang::rascal::grammar::ParserGenerator     833    26.5%  |main://lang::rascal::grammar::ParserGenerator|
                                     ParseTree     336    10.7%  |main://ParseTree|
                                       $shell$     304     9.7%  |main://$shell$|
                                          List     204     6.5%  |main://List|
 lang::rascal::grammar::definition::Priorities     185     5.9%  |main://lang::rascal::grammar::definition::Priorities|
                  lang::rascal::syntax::Rascal     177     5.6%  |main://lang::rascal::syntax::Rascal|
                 lang::rascal::format::Grammar     158     5.0%  |main://lang::rascal::format::Grammar|
                                        String     158     5.0%  |main://String|
lang::rascal::grammar::definition::Productions     128     4.1%  |main://lang::rascal::grammar::definition::Productions|
    lang::rascal::grammar::definition::Modules     109     3.5% 
jurgenvinju commented 1 year ago

But the slowness does not seem to be related much to the size of the file:

$ wc src/org/rascalmpl/library/lang/rascal/grammar/ParserGenerator.rsc 
     633    2731   28435 src/org/rascalmpl/library/lang/rascal/grammar/ParserGenerator.rsc
$ wc src/org/rascalmpl/library/ParseTree.rsc 
     790    4058   28576 src/org/rascalmpl/library/ParseTree.rsc
$ wc src/org/rascalmpl/library/List.rsc 
    1138    3553   24912 src/org/rascalmpl/library/List.rsc
jurgenvinju commented 1 year ago

If you parse the same module again, the parsing time goes down aggressively. Possibly this is the JIT kicking in:

rascal>import lang::rascal::grammar::ParserGenerator;
Parsing: 0
Unbinarizing, post-parse filtering, and mapping to UPTR: 0
Parsing: 0
Unbinarizing, post-parse filtering, and mapping to UPTR: 0
Loading module |file:///Users/jurgenv/git/rascal/src/org/rascalmpl/library/lang/rascal/grammar/ParserGenerator.rsc|
Parsing: 108
Unbinarizing, post-parse filtering, and mapping to UPTR: 17
jurgenvinju commented 1 year ago

Observations:

jurgenvinju commented 1 year ago
jurgenvinju commented 1 year ago
jurgenvinju commented 1 year ago

With the JIT turned off:

rascal>import lang::rascal::grammar::ParserGenerator;
Parsing: 101
Unbinarizing, post-parse filtering, and mapping to UPTR: 20
Parsing: 5
Unbinarizing, post-parse filtering, and mapping to UPTR: 1
Loading module |file:///Users/jurgenv/git/rascal/src/org/rascalmpl/library/lang/rascal/grammar/ParserGenerator.rsc|
Parsing: 2454
rascal>import lang::rascal::grammar::ParserGenerator;
Parsing: 7
Unbinarizing, post-parse filtering, and mapping to UPTR: 1
Parsing: 7
Unbinarizing, post-parse filtering, and mapping to UPTR: 1
Loading module |file:///Users/jurgenv/git/rascal/src/org/rascalmpl/library/lang/rascal/grammar/ParserGenerator.rsc|
Parsing: 2217
Unbinarizing, post-parse filtering, and mapping to UPTR: 592
ok
rascal>import lang::rascal::grammar::ParserGenerator;
Parsing: 6
Unbinarizing, post-parse filtering, and mapping to UPTR: 1
Parsing: 5
Unbinarizing, post-parse filtering, and mapping to UPTR: 1
Loading module |file:///Users/jurgenv/git/rascal/src/org/rascalmpl/library/lang/rascal/grammar/ParserGenerator.rsc|
Parsing: 2169
Unbinarizing, post-parse filtering, and mapping to UPTR: 578
ok

Indeed, the parser does not run faster the second time or the third time. And it is also slower the first time. So we may attribute the difference between 100ms and 600ms entirely to the JIT compiler.

jurgenvinju commented 1 year ago
jurgenvinju commented 1 year ago
arnoldlankamp commented 1 year ago

If you want to measure the execution time of some piece of code you can use ManagementFactory.getThreadMXBean().getCurrentThreadCpuTime(). This cuts out most of the JVM related noise (i.e.: GC & JIT). It may be interesting to see what the difference is between CPU time and real time.

The parser implementation can be GC heavy for highly ambiguous grammars, since in those cases it produces a large graph in which many branches will remain 'alive' for a long time. So lots of marking/tracing/moving, but very little memory being freed up each cycle. But I doubt this is the main issue in this specific case, since there shouldn't be a large amount of long lived stacks running in parallel, while parsing a file using the rascal grammar.

If it turns out a large portion of the time is spend by the JVM, you could try and do a run with the old CMS GC instead of G1. While G1 is faster for like 98% of the normal use cases, it can sometimes perform absolutely terribly on big and/or cyclic graph like data structures (like the parser produces). Since G1 segments the heap (to be able to deal with larger heap sizes efficiently), it needs to keep track of cross segment pointers. Graphs are generally harder to segment and it can run into trouble with these.

arnoldlankamp commented 1 year ago

As for improving the performance of the parser algorithm/implementation; that would be a challenge, as it's already very highly optimized across all available dimensions.

The only semi-significant performance improvement that comes to mind is to implement something like an 'algorithm stack'. Classify each part of the grammar (as possibly ambiguous, certainly non-ambiguous, regular or unique match) and parse each segment using its most efficient implementation. However this will have a significant cost in terms of implementation complexity and will add a non-trivial analysis burden to the parser generator. Figuring out how to efficiently do that last part alone, could probably net you a doctoral degree πŸ˜„ .

Other than that πŸ€·β€β™‚οΈ . Migrate the implementation to C or Rust πŸ˜› ?

arnoldlankamp commented 1 year ago

But in terms of this specific issue. Why not pre-parse the standard library for Rascal, save the ASTs and ship them with the distribution? Just use the name + hash of the rsc file when saving the ASTs, so the cached version won't be used in case you are modifying the Rascal source.

The process would, for example, look like:

DavyLandman commented 1 year ago

I did some profiling of the parser (just a loop that went through all rascal modules).

image

Almost no GC.

Parsing phases split: image

Here is reduce: image

Here is expand: image

And the Node flattener is one heck of a recursive party: image

The hotspots are this: image

So this roughly alines with the: to make the parser faster we'll have to tweak the algorithm. Perhaps we could take a look at the EdgeSet and the IntegerMap and IntegerKeyedDoubleValueHashMap.

arnoldlankamp commented 1 year ago

When looking at the traces, nothing immediately stands out to me. It looks as I would expect ; a pretty even split between reduce and expand, with most of the work being done in updateNextNode.

buildResult also looks pretty decent, since it doesn't seem to have to deal with ambiguities.

The recursion in the flattener is not that surprising, since it needs to traverse all the way down the binarized parse forest to be able to flatten it. The flattener actually contains a conditional tail recursion optimization to keep the stack depth in check when handling lists (otherwise you'd run out of stack size fast). Doing it entirely without recursion would be even more complicated than it already is, because of all the code needed for keeping track of ambiguities and cycles in the parse forest. You're basically dealing with forking and recombining stacks; without using recursive functions you'd need to keep track of even more state. What would be possible, is to remove the layer of abstraction from the flattener by removing all the factories and just hardwiring the concrete implementation. There should be an older version of the flattener like this in the version control history (likely from before the Git migration), which might still be fully functional as is, unless any changes where made in the flattener code at a later date. You could shave ~10% or more of the buildResult time at the cost of more easily being able to swap to a different AST implementation or the option to support multiple different AST implementations simultaneously. However looking at the above trace, this would only net you a 1-2% decrease in overal execution time, which may not be worth it.

As for the custom data structures. These are basically specialized implementations aimed as reducing memory usage and improving performance. Maybe you could play around with the initial capacity a bit to see if that has any noticeable impact. As for the implementations themselves; they were written in the JDK 1.6 era with the JIT's capabilities at that time in mind. Maybe some things could be written slightly differently today, but that would require some looking in to and the gains likely won't be huge. These data structures are heavily used, which is why a significant amount of time is spend in their methods in the trace.

As for the GC, looking at that graph, it doesn't seem to be an issue. While G1 can offload work to worker threads when worst-case behavior pops-up (which may not be properly reflected in the GC stats), it doesn't seem to be what's happening here.

Can we maybe narrow down this issue by having a look to find out if there are specific pieces of syntax or certain grammars it struggles more with than others?

Also, maybe looking at the parser's trace logging can give us some clue as to what it's doing specifically. Maybe we can spot something out of the ordinary in there. So we can have a more targeted look at the implementation. It should print expands, reduces and propagations (propagations are a right nullable fix thing) for specific input locations/ranges, which should be relatively easily relatable to locations in a source file. This could also potentially point us to improvements that could be made elsewhere as well, if such a possibility exists. As an example: it could indicate the parser is dealing with local ambiguities in a grammar which are resolved at a fairly late point in the parse, causing it to drag along certain stacks for longer than may be necessary. This may be resolved by modifying a rule in a grammar.

jurgenvinju commented 1 year ago

But in terms of this specific issue. Why not pre-parse the standard library for Rascal, save the ASTs and ship them with the distribution? Just use the name + hash of the rsc file when saving the ASTs, so the cached version won't be used in case you are modifying the Rascal source.

The process would, for example, look like:

  • Requests parse of ParserGenerator.rsc
  • Calculate hash for ParserGenerator.rsc (e.g.: d5gq4w4g4r7jujsg43y6)
  • Look for ParserGenerator-d5gq4w4g4r7jujsg43y6-cached.ast

    • Found -> Use cached AST
    • Not found -> Parse ParserGenerator.rsc

According to my analysis that would only help a bit. The parser generator code is loaded for the grammar reification operator anyway.

jurgenvinju commented 1 year ago

It may be the case that 50% of the time in parsing string templates is spent in temporary soon to be dead stacks. I'll have a look in the grammar. The big module has one very long string templates. Nevertheless it remains the module that trains the JIT , so it will always be the slower module with the current setup.

DavyLandman commented 1 year ago

As for the custom data structures. These are basically specialized implementations aimed as reducing memory usage and improving performance. Maybe you could play around with the initial capacity a bit to see if that has any noticeable impact. As for the implementations themselves; they were written in the JDK 1.6 era with the JIT's capabilities at that time in mind. Maybe some things could be written slightly differently today, but that would require some looking in to and the gains likely won't be huge. These data structures are heavily used, which is why a significant amount of time is spend in their methods in the trace.

I think it all looks good to me. I was wondering (since the profile shows the .get on the IntegerMap and IntegerKeyedDoubleValueHashMap) about your trade-offs during the design of the Int keyed maps? They appear to use a linked list for hash-collision bucket (aka seperate chaining), if we have lots of hash collisions, could that explain the relative amount of time caused in .get? Where if we would use open addressing we might have more locality during the probing of next collision? I've looked at a few other Int keyed hashmaps, and they all seem to use a seperate array for the int values, do some form of open addressing on there, and then use the resulting index to deref the separate value array. Do you remember your experiments at the time? Or would you like to spend some time right now on this? ;)

arnoldlankamp commented 1 year ago

Sure, you could have a look at on open addressing implementation for the hashmaps. It may well work well in this case, since the keys are generally integers that are associated with an input location, so they should distribute pretty uniformly if you use a power of two for the table size and hash them by simply dividing by the table's size.

arnoldlankamp commented 1 year ago

Related to usethesource/rascal#1813.

It may be interesting to find out why the enlarge method of the EdgeSet has such a large impact on the total run time. For some reason the add function seems to be called a lot. Calling add on the EdgeSet is an indication of a parse stack merge. And stacks can only merge if they first split, so why is this happening so often that it takes up a significant amount of the execution time πŸ€” ?

jurgenvinju commented 1 year ago

But in terms of this specific issue. Why not pre-parse the standard library for Rascal, save the ASTs and ship them with the distribution? Just use the name + hash of the rsc file when saving the ASTs, so the cached version won't be used in case you are modifying the Rascal source.

The process would, for example, look like:

  • Requests parse of ParserGenerator.rsc
  • Calculate hash for ParserGenerator.rsc (e.g.: d5gq4w4g4r7jujsg43y6)
  • Look for ParserGenerator-d5gq4w4g4r7jujsg43y6-cached.ast

    • Found -> Use cached AST
    • Not found -> Parse ParserGenerator.rsc

This is also worth investigating.

jurgenvinju commented 1 year ago

Have to figure out how long it takes to read in the parse trees from disk. Generally since they are about four times as big as the input sentence, you get little out of it. But at least the process is deterministic.

jurgenvinju commented 1 year ago

Related to usethesource/rascal#1813.

It may be interesting to find out why the enlarge method of the EdgeSet has such a large impact on the total run time. For some reason the add function seems to be called a lot. Calling add on the EdgeSet is an indication of a parse stack merge. And stacks can only merge if they first split, so why is this happening so often that it takes up a significant amount of the execution time πŸ€” ?

My guess is that the string template grammar could use a little more lookahead; perhaps it is constantly accepting empty lists or reducing early only to find out the " hasn't been reached yet. Early list reduction would lead to such merges I guess.

jurgenvinju commented 1 year ago

Related to usethesource/rascal#1813.

It may be interesting to find out why the enlarge method of the EdgeSet has such a large impact on the total run time. For some reason the add function seems to be called a lot. Calling add on the EdgeSet is an indication of a parse stack merge. And stacks can only merge if they first split, so why is this happening so often that it takes up a significant amount of the execution time πŸ€” ?

Another cause could be a cyclic grammar. If that happens in whitespace we'd never see it when interpreting the tree, but we'd pay for it nevertheless.

arnoldlankamp commented 1 year ago

Have to figure out how long it takes to read in the parse trees from disk. Generally since they are about four times as big as the input sentence, you get little out of it. But at least the process is deterministic.

Yes, the parse trees are quite a big bigger, but reading their binary format from disc is a process that should scale fairly linearly with file size and should be quicker than parsing the source file, since parsing is a lot more compute heavy.

How much the difference will be I can only guess, but writing a benchmark for reading in a parse tree from disc should be a relatively easy thing to do (maybe you can even time it in the REPL or something?). If the difference is large enough, implementing a caching solution like this could resolve the issue without too much time investment.

arnoldlankamp commented 1 year ago

Related to usethesource/rascal#1813. It may be interesting to find out why the enlarge method of the EdgeSet has such a large impact on the total run time. For some reason the add function seems to be called a lot. Calling add on the EdgeSet is an indication of a parse stack merge. And stacks can only merge if they first split, so why is this happening so often that it takes up a significant amount of the execution time πŸ€” ?

My guess is that the string template grammar could use a little more lookahead; perhaps it is constantly accepting empty lists or reducing early only to find out the " hasn't been reached yet. Early list reduction would lead to such merges I guess.

I'm not familiar with what the string template grammar looks like, but to give some background calling add on the EdgeSet happens in the following two cases: Consider the grammar: A := BC

Passing a debug listener to the parser might give us some information about what's going on. It will probably invoke more moving or expanding calls than expected for possible culprits.

arnoldlankamp commented 1 year ago

Having said that, it may also well be that is turns out nothing out of the ordinary is happening when we look at the logging, but if large EdgeSets are being created, like the improvement the Arrays.copyOf change accomplished suggest, something unexpected may be happening and it at least warrants a look πŸ€” .

DavyLandman commented 1 year ago

Sure, you could have a look at on open addressing implementation for the hashmaps. It may well work well in this case, since the keys are generally integers that are associated with an input location, so they should distribute pretty uniformly if you use a power of two for the table size and hash them by simply dividing by the table's size.

I just ran some stats on all get of the IntegerKeyedDoubleValueHashMap. Here is how often we had to walk the chain (so a measure of how deep the hash collision nodes were).

 834900 0
 208293 1
 109675 2
  51356 3

So in 369324 of the total 1204224 gets did we end up in a node with 1 or more hash collisions. So 30% of the gets are most likely multiple cache misses.

I'm glad to see they aren't that deep, but it might still pay off to see if some open addressing could make that a bit more close to each other.

jurgenvinju commented 1 year ago

If I run ParserGenerator through the normal parser, generated parsers in ParseTree, then I get consistently fast behavior:

rascal>() { parse(#start[Module], |file:///Users/jurgenv/git/rascal/src/org/rascalmpl/library/lang/rascal/grammar/ParserGenerator.rsc|); }()
Parsing: 0
Unbinarizing, post-parse filtering, and mapping to UPTR: 0
Parsing: 0
Unbinarizing, post-parse filtering, and mapping to UPTR: 0
Parsing: 126
Unbinarizing, post-parse filtering, and mapping to UPTR: 18
ok

This is explainable since the parser superclass has already been optimized by the JIT having to have parsed ParserGenerator already earlier to be able to generate the current parser.

jurgenvinju commented 1 year ago

Reading the parse tree from disk:

rascal>cpuTimeOf(() { readBinaryValueFile(#start[Module], |home:///test.bin|); }) 
Parsing: 0
Unbinarizing, post-parse filtering, and mapping to UPTR: 0
Parsing: 0
Unbinarizing, post-parse filtering, and mapping to UPTR: 0
int: 78985000

So that's 80ms. compared to 126 (but also so often lower). We will not win the battle with caching the parse trees on disk, even though it is about 40% faster. It remains something to consider, but for know it seems making sure the parser generator is not loaded at all is still the most viable option for solving this issue.

DavyLandman commented 1 year ago

So in 369324 of the total 1204224 gets did we end up in a node with 1 or more hash collisions. So 30% of the gets are most likely multiple cache misses.

I'm glad to see they aren't that deep, but it might still pay off to see if some open addressing could make that a bit more close to each other.

I've butchered the code to use linear probing, and also tried out with different existing IntObjectMaps, they are all getting to the same performance numbers. So I think only by making a new dedicated data structure that tries it's best to explain to the jitter about range checks etc, could we maybe squeeze out a bit more here. But the current code, even with the 30% of hash collisions, is fast, and will take some effort to improve upon.

jurgenvinju commented 1 year ago

This would not expose directly to the Rascal level what kind of encoding we are using to encode the parser, and it could be used to avoid the use of the # operator completely at run-time. The code itself would need to switch between the two modes somehow.

arnoldlankamp commented 1 year ago

So in 369324 of the total 1204224 gets did we end up in a node with 1 or more hash collisions. So 30% of the gets are most likely multiple cache misses. I'm glad to see they aren't that deep, but it might still pay off to see if some open addressing could make that a bit more close to each other.

I've butchered the code to use linear probing, and also tried out with different existing IntObjectMaps, they are all getting to the same performance numbers. So I think only by making a new dedicated data structure that tries it's best to explain to the jitter about range checks etc, could we maybe squeeze out a bit more here. But the current code, even with the 30% of hash collisions, is fast, and will take some effort to improve upon.

Well it was worth a try. In theory open addressing could have been a good fit or this specific use-case.

In the past I've also found that open addressing may not always perform better than a chaining implementation. The reasons I found back then were the following:

DavyLandman commented 1 year ago

yup, agreed, that was roughly the path I took. Decrease the load factor to have less clustering.

jurgenvinju commented 1 year ago

https://github.com/usethesource/rascal/pull/1815 this adds the fundamental capability of saving and storing "parser functions" on disk.

DavyLandman commented 1 year ago

Nice!

Do you have an idea of how we could trigger this caches (the once in the evaluator for concrete syntax)? Or is that still WIP?

jurgenvinju commented 1 year ago

WIP.

DavyLandman commented 1 year ago

if it's automatic, that would be nice, than it's just an aspect of packaging :)

arnoldlankamp commented 1 year ago
  • and for storing I was thinking a small mvn plugin that would store those files using a secret interpreter mode.

You may be able to use the Exec Maven Plugin for this. This plugin allows you to execute a Java program in a specific build phase. Like generate-sources (if you'd like to have the cache available during development) or package (if you don't want the build time overhead during development, but only ship the generated cache in the distribution).

jurgenvinju commented 1 year ago

https://github.com/usethesource/rascal/pull/1815 is ready for review if anyone is interested. it should go a long way in preventing the bootstrapping of a parser generator, if applied to both the parsing of concrete syntax fragments and the parsing of DSL files.

In a DSL parser, the parser contributions would not use p = parser(#start[Program]) but rather p = loadParser(|lib:///myProject/myCachedParserLocation|) and the call to p would use a manually constructed reified type to avoid loading the type reifier (which is part of the parser generator): p(type(\start(sort("Program")), ()), input, src)

In the concrete syntax fragment parser we'd do a similar trick. That's for another PR.

The docs of ParseTree.rsc contain an example usage of storeParsers and saveParsers. The lang::rascal::tests::concrete::Parsing contains a single test.

jurgenvinju commented 1 year ago

if it's automatic, that would be nice, than it's just an aspect of packaging :)

The overhead of packaging all the class files from the in-memory file system into one binary file is significant. So I can't make it automatic inside the interpreter. It could be a default side-effect of the type checker though, to write a parser file next to each .tpl file.