Allow DSLs deployments to load a parser from the precompiled file without waiting for evaluators to load

DavyLandman commented 11 months ago

Note, this is only for VS Code DSL deployments, not for normal interactive development.

What we've observed is that users associate no syntax highlighting as an indication that "it" is not working. When a VS Code extension that uses rascal starts, it can take up to 5 or 10s (depending on the machine even longer) before the all the modules are loaded, and the parser function is handed to the lsp server. Many improvements have already been made (see excellent discussion in #267) but we wanted to have the opportunity to really close the gap.

This change builds upon the work of @jurgenvinju to offer an opaque file that contains precompiled parsers. It's only available when using the typescript (aka npm/deployment) path, but in that case it will load a parser without any need for an evaluator.

jurgenvinju commented 11 months ago

Looks good as code. I'm sceptical wrt the design and the maintainability for the future.

Please explain why we actually need this additional support. Not the motivation from above, but the reason there are still seconds to load the parser. I thought I had covered all the tracks. Is a parser generator still loading? Or are we avoiding the loading of any module of the language implementation now?

I'm hesitant with this design, since if this additional API is avoidable, that would be better since: (a) it is a lot of code, (b) an extra externally visible API, which (c) it more than doubles the number of fields in the current contract for Language) and (d) in the compiled world it probably makes no sense anymore.

It also breaks the consistency of the parametrized-lsp design that goes normally hrough Rascal callbacks (IFunction instances). Instead this ties the parameterized-lsp bridge completely into the Java implementation of the brand new function loadParsers in Prelude.java, while my intention was that DSL designers would write a Rascal function that calls loadParsers themselves.

So I need some convincing; are there no alternatives for optimizing this problem that keep the design as-is? What is currently taking the time before the loaded parser is presented to the LSP link?

DavyLandman commented 11 months ago

Thanks for reviewing.

Currently, the first parse starts the extension, which does:

registerLanguage
start new evaluator
import root module defined by register language
call the main function
which returns a IConstructor that contains the parser function
calll the parser function

Now, this already takes a lot less with the precompiled module parsers and the precompiled stored parser. But it still can (especially on older/slower laptops) take up to 10s--20s. Quite some of the time (as also seen in the profiles in #267) is spend in parsing rascal modules, and general import stuff, I have not been able to identify further opportunities for performance reductions. So indeed, the final step I propose to close the gap is to skip all this module loading.

I agree it does grow the interface on the typescript side, and I don't really like that. We could mark it as deprecated/internal/experimental? The benefit is that the npm dependencies don't follow VS Code upgrades, so if we deprecate it at a later point, people's extensions won't be broken, since they'll have to explicitly update.

An alternative that would get us a bit closer would be to use the multiple registry feature to load a specific one that only has the syntax module imported. This would reduce the import times, but it would impact memory pressure, as we would be having yet another evaluator alive just for that syntax module.

DavyLandman commented 11 months ago

Here are some fresh profiles, I didn't catch it from the start, but it roughly took 15s--18s to load.

looking at the hot spots:

most of the time (60%) is spend in SGTDBF::parse
quite some time spend in io.usethesource.vallang.impl.fields.ConstructorWithKeywordParametersFacade.accept () which is called by the AST builder a lot.
also 40% of the time in org.rascalmpl.values.RascalFunctionValueFactory.createHole () which suprise suprise, does start an evaluator to get the parser generator. (I had not spotted this the last time)

so we maybe we could try and remove the parser generator dependency for createHole?

that would take of at least a few seconds again. But in the end, parsing & ast building for a large set of files (our checker is not small) will take time as long as we are not compiled.

jurgenvinju commented 11 months ago

Thanks for the analysis. 10 or 20 seconds to load a few modules does not make sense to me. Something is very weird in this entire profile.

Parsing a large module, purely the SGTBF part should take 200ms for very long files like List.rsc. But that's only for the first large file. Once the parser is warmed up by the JIT the same module should take 70ms or less.
Say we are parsing 10 modules, that would be around 1 second.
Then we have the mapping to AST, that's a lot less work but does require memory allocation. Another second.
Then we may have concrete syntax fragments. per file that should not take more than the original parse. So another second.

In theory, loading 10 modules might take around 3 seconds, worst case, but not 10 or 20 seconds. So I think we have to dig a little deeper here. If we really can't find the issue, well then we have to come up with some temporary solution like you proposed. Note that the latest solution makes the concrete syntax parser cache dead. But perhaps this is for other use cases still handy.

I'm going to do some code staring based on the profiles you shared.

jurgenvinju commented 11 months ago

In import there are 3 visitors over parse trees, none of these skip layout. That accounts for a factor 2 in that stage probably. Visiting and re-building those trees for nothing.
In ASTBuilder we use TreeAdapter.getASTArgs and TreeAdapter.getListASTArgs. Those allocate memory in the order of the size of the entire tree together, and then throw it right away again. So that could account for a third (?) of the time building AST nodes.
- It would be best to inline these loops that skip layout and terminals, so no temporary memory must be allocated.
- Note that it is possible to change these two functions from producing IList to producing Iterable<IValue> to achieve the same effect, but that would still allocate an Iterable for every node and keep an extra layer of memory allocation in the order of the size of the tree. Not sure if escape analysis could catch those, but the current IListWriters are definitely beyond the scope of escape analysis.
SGTBF::parse also contains the mapping from parse graphs to parse trees. If there are hidden ambiguities (e.g. cycles in layout), then this can cost a lot more than expected. What's under this method in the profile?

jurgenvinju commented 11 months ago

it's possible to map from SGTBF parse forest directly into the AST class hierarchy. That would skip an entire tree and memory allocation stage for Rascal programs. This is a bigger refactoring though that I'd rather avoid since we would not need it anymore in a compiled world.

DavyLandman commented 11 months ago

Looking at the profile, most of the SGTBF::parse calls take between 100--500ms, I even found two that took 900ms and 1200ms.

I've shared you a trace file so you can see what's happening.

DavyLandman commented 11 months ago

I've tested this, it works out as intended, the grammar loads fast and the users sees syntax highlighting well before the evaluator is loaded.

DavyLandman commented 11 months ago

Yes, I've tested it, this alternative approach works. I really wanted to avoid starting an evaluator. But I agree, just constructing it and not doing anything with it should be an acceptable tradeoff.

sonarcloud[bot] commented 11 months ago

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells

0.0% Coverage
0.0% Duplication

usethesource / rascal-language-servers

Allow DSLs deployments to load a parser from the precompiled file without waiting for evaluators to load #297