Qute: use better parser

FroMage commented 2 years ago

Description

We've seen several issues in Qute related to parsing, and it would probably make everyone's life easier if we could use a real parser that could produce a real AST for the expressions.

https://github.com/quarkusio/quarkus/pull/22181 revealed that there's another hand-written parser in https://github.com/redhat-developer/quarkus-ls by @angelozerr but I'm always skeptical of hand-written parsers when ANTLR does such a fine job and make maintenance and improvements easier.

I'd be willing to help with prototyping, evaluation and experience. I'm sure @Sanne and @gavinking also have lots of valuable things to say about parsers.

Implementation ideas

No response

quarkus-bot[bot] commented 2 years ago

/cc @mkouba

angelozerr commented 2 years ago

For this parser, please keep in mind that it should be really nice that this parser can be used in an IDE/Editor context too, that's to say:

it should support fault tolerance (broken Qute template).
it should store the minimal information (ex : store offset instead of line/character) to avoid memory problem.
it should stop the parse as soon as possible (when user type content in the editor, the last parse of the template should be canceled). Ideally incremental parser should be better but it's too hard to implement it.

Sanne commented 2 years ago

Good idea! But my experience with parsers is very basic; @gavinking and @sebersole are the better experts - although they're very busy with Hibernate for at least some more weeks :) Maybe after ORM 6 is released they migth be interested.

MartinX3 commented 2 years ago

What about the parser https://github.com/korlibs/korte ? We use it replace variables in a string (optionally in a file) and we can extend the parser with our own features.

sebersole commented 2 years ago

I'm a bit behind the curve there as I have no real knowledge of Qute. But I do like designing parsers 😁

Realistically I won't be able to help with this until after ORM 6.0 however

FroMage commented 2 years ago

I definitely don't want @sebersole or @gavinking to help prototype this. Just thought you might have something interesting to say on the "hand writter parser" vs "antlr" question.

sebersole commented 2 years ago

I definitely don't want @sebersole or @gavinking to help prototype this. Just thought you might have something interesting to say on the "hand writter parser" vs "antlr" question.

Wow, like your work too :wink:

I'd always prefer Antlr over hand-parsing

FroMage commented 2 years ago

Wow, like your work too wink

HAHAHA, I just now realised this could be interpreted as me not wanting your work OOOPS. I meant I didn't intend to bother you by requesting you to work on this :)

sebersole commented 2 years ago

Wow, like your work too wink

HAHAHA, I just now realised this could be interpreted as me not wanting your work OOOPS. I meant I didn't intend to bother you by requesting you to work on this :)

I know, just playing :D

sebersole commented 2 years ago

It'll come down to timing for me, unless someone gets to it before ORM 6.0 is final.

mkouba commented 2 years ago

I'm not against an AST-based parser powered by ANTLR but keep in mind that it should be reasonably small and performant because we'll use it at runtime.

maxandersen commented 2 years ago

What about the parser korlibs/korte ?

it seem reliant on kotlin stdlibs which would be a blocker imo ...and its a template engine rather than a parser,right?

maxandersen commented 2 years ago

have antlr improved so we can make it be fault tolerant to avoid having to maintain two parsers (one for runtime and one for tooling) ?

angelozerr commented 2 years ago

have antlr improved so we can make it be fault tolerant to avoid having to maintain two parsers (one for runtime and one for tooling) ?

Having one parser for runtime and tooling should be really nice:

for maintain
tooling could have an embedded parser, but it could too use the Qute JAR from the user project. In otherwords, user could have the same syntax error than the Qute version used in her project without installing last version of tooling.

In tooling context, fault tolerant is required, but memory is very important and stop parsing too to provide the best performance when user type some content in the Qute editor (each character that user is typing recrate the Qute AST template).

sebersole commented 2 years ago

I'm not against an AST-based parser powered by ANTLR but keep in mind that it should be reasonably small and performant because we'll use it at runtime.

In my experience, Antlr generated parsers are generally faster than hand-written ones - unless you define the grammar poorly ofc

sebersole commented 2 years ago

have antlr improved so we can make it be fault tolerant to avoid having to maintain two parsers (one for runtime and one for tooling) ?

:shrug:

Not idea what specific limitation you are thinking of to be able to answer that personally

maxandersen commented 2 years ago

Not idea what specific limitation you are thinking of to be able to answer that personally

In tooling you will want the parser to continue past syntax or semantic errors to make content assist and semantic navigation still viable in rest of the parsed document even if you have typo or temporary edit. That is in general what a fault tolerant parser enables.

mkouba commented 2 years ago

I'm not against an AST-based parser powered by ANTLR but keep in mind that it should be reasonably small and performant because we'll use it at runtime.

In my experience, Antlr generated parsers are generally faster than hand-written ones - unless you define the grammar poorly ofc

Cool. Let me know if you need some help to understand the poor syntax of qute ;-).

FroMage commented 2 years ago

Cool. Let me know if you need some help to understand the poor syntax of qute ;-).

Wait a minute, I didn't mean to force some poor innocent to do this for us. I'm ready to help get this started because I have some antlr knowledge, but I won't have time to do it alone. I should be able to make a PoC and perhaps if @mkouba has time we can work on it together to build it up for Qute. Then if the IDE people want to improve error recovery (totally doable in Antlr) they can do it and we'll already have all the tests set up to validate it didn't break anything for us.

datho7561 commented 1 year ago

@JessicaJHee and I are planning on taking a look into this for the next sprint as a part of the Qute language server included in vscode-quarkus.

angelozerr commented 1 year ago

@datho7561 please keep in mind we have a Qute parser which is very performant and uses few memory in editor context, and we support fault tolerant.

For instance we stop the building of ast as soon as user type a charactere, we parse parameters, method parameters only if we need it, etc

My fear is that we will spend à lot of time to try to provide a parser with antlr and spend a lot of time with error recovery to support fault tolerant and in final not having the same performance and memory improvement than our current parser.

But perhaps Im wrong.

datho7561 commented 1 year ago

My fear is that we will spend à lot of time to try to provide a parser with antlr and spend a lot of time with error recovery to support fault tolerant and in final not having the same performance and memory improvement than our current parser.

I think that we should be able to get a performant and fault tolerant parser with ANTLR. I agree that it will likely take a lot of time. I think it's important to make a new parser, since there are many cases that the current parser doesn't handle, most notably, nested expressions.

maxandersen commented 1 year ago

@datho7561 just making sure it's clear what @angelozerr is saying here. He is saying we already have a second parser which is used in the IDE support because the current one in qute was never fault tolerant (ie. Be able to recover from error to give info about the whole document rather than just until first error).

Thus if there is a need for something like antlr based parser to at least consider if such features could be included too as otherwise it is true - we would be evolving two and maintaining three parsers.

mkouba commented 1 year ago

@datho7561 just making sure it's clear what @angelozerr is saying here. He is saying we already have a second parser which is used in the IDE support because the current one in qute was never fault tolerant (ie. Be able to recover from error to give info about the whole document rather than just until first error).

Yes, the runtime parser does not need to be fault tolerant. It aims to be memory efficient and fail fast.

Thus if there is a need for something like antlr based parser to at least consider if such features could be included too as otherwise it is true - we would be evolving two and maintaining three parsers.

The idea is to replace these parsers (runtime and vscode) with a third parser that could be used everywhere. Of course, we need to make sure that it meets the requirements for both use cases. And IMO we can't really know without a solid prototype.

angelozerr commented 1 year ago

Yes, the runtime parser does not need to be fault tolerant. It aims to be memory efficient and fail fast.

I'm not sure that our Qute parser uses more memory than your qute parser since the parse is done on each key stroke and we take care of memory (ex : don't use String#substring which creates several instances of String) . Today we have fault tolerant support which doesn't stop the parse, but I think it is not a big deal to throw an error (by using a tolerant flag) if teh expected token is not good.

mkouba commented 1 year ago

Yes, the runtime parser does not need to be fault tolerant. It aims to be memory efficient and fail fast.

I'm not sure that our Qute parser uses more memory than your qute parser since the parse is done on each key stroke and we take care of memory (ex : don't use String#substring which creates several instances of String) . Today we have fault tolerant support which doesn't stop the parse, but I think it is not a big deal to throw an error (by using a tolerant flag) if teh expected token is not good.

Yeah, sure. I don't have any numbers ;-). My point was that the existing parsers have different goals and requirements, i.e. for the runtime parser the fault tolerance was never a goal. We just need to parse a template quickly and build the template nodes effectively.

So if we manage to have an efficient parser that can be used for both use cases then +100. I don't care whether it's ANTLR based or not.

angelozerr commented 1 year ago

So if we manage to have an efficient parser that can be used for both use cases then +100. I don't care whether it's ANTLR based or not.

It will require some refactoring, but I think we could use our fault tolerant parser for qute core. But let's see the POC of @datho7561 with ANTLR

quarkusio / quarkus

Qute: use better parser #22235

Description

Implementation ideas