pest-parser / pest3

WIP pest 3.0
Apache License 2.0
12 stars 4 forks source link

Add debbugging tools - validation and interactive debugger #17

Open Tartasprint opened 5 days ago

Tartasprint commented 5 days ago

Hi ! I am so happy to see you (all of you) were able to do this first mixture of the pest3 project. I gave it a try, and it is amazing.

I noticed that two things I liked in pest:

I noticed the validators are listed in the todo file. I would like to work on implementing static validators for ensuring a parser runs in finite time.

Is that something I can do at this stage of the project ?

For the debugger, it would be nice to have, but I think it could take much more work. I can still give it a shot if it's wanted.

Tartasprint commented 5 days ago

If i am reading the code organisation right, validators could be implemented in a similar place as in pest, under the meta crate.

I believe they should be optional (maybe on by default) so that it is possible to write a validator that ensures the parser will run in finite time, at the cost of saying some grammars aren't valid. For that reason they should be optional, so that if the grammar writers are confident, they can take care of validating manually.

Something very cool would be to integrate some keyword (like unsafe in Rust). But that would add maybe unnecessary complexity to the grammar of pest. It could also be done using specifically formatted comments (like JSDoc/Typescript), thus avoiding polluting the grammar.

tomtau commented 4 days ago

@Tartasprint thanks for your interest!

Is that something I can do at this stage of the project ?

I'd say yes. At this stage, I was looking into two major refactoring that could affect it:

  1. crate restructuring, so that one can depend on a single pest crate (like with serde) instead of two. I sketched it out here: https://github.com/pest-parser/pest3/pull/16 but unfortunately it broke tests: https://github.com/pest-parser/pest3/pull/16#issuecomment-2271096136 and I haven't had time to come back to this to fix it 😢 but feel free to pick it up from there.
  2. I had this idea for event-based API: https://github.com/pest-parser/pest/discussions/885#discussioncomment-6451348 but I don't want it to become a blocker for other work, so we can perhaps ignore it.

For the debugger, it would be nice to have, but I think it could take much more work. I can still give it a shot if it's wanted.

Yes, it's a bit of an unexplored territory: https://github.com/pest-parser/pest/discussions/885#discussioncomment-6450014

It should be straightforward to do a debugger in the same way as in the current pest (i.e. just implement pest_vm with some breakpoint functionality, one major interpreter change may be related to meta rules @TheVeryDarkness ?), but two ideas were floating around how to make it better:

  1. @dragostis (the original author of the pest_debugger) mentioned to me that there could be a separate "debug-friendly" parser generator code, so when the user wants to debug their parser, they can add an annotation on their derive(Parser) which will insert extra code during the generation that could help with debugging (perhaps it could launch a DAP server https://microsoft.github.io/debug-adapter-protocol/overview and move parsing steps based on what comes from DAP requests ? ) to inspect the stack etc.
  2. maybe pest_vm can be implemented with async / continuations per each parsing step, so that the debugger can be run in a browser / WASM interactively even for long-running (or infinite) running grammars+inputs.

Anyway, it's up to you how to approach it if decide to work on it (any approach would be valuable, I think).

If i am reading the code organisation right, validators could be implemented in a similar place as in pest, under the meta crate.

If I don't consider the refactoring I mentioned, I think so (@TheVeryDarkness or should it go somewhere else?).

I believe they should be optional (maybe on by default) so that it is possible to write a validator that ensures the parser will run in finite time, at the cost of saying some grammars aren't valid. For that reason they should be optional, so that if the grammar writers are confident, they can take care of validating manually. Something very cool would be to integrate some keyword (like unsafe in Rust). But that would add maybe unnecessary complexity to the grammar of pest. It could also be done using specifically formatted comments (like JSDoc/Typescript), thus avoiding polluting the grammar.

Maybe this could be just be on the derive annotation level (instead of inside the grammar)? For example, by default it'll validate the grammar, but one could optionally disable it?

Tartasprint commented 4 days ago

I gave some thought to the things you said up there. One of the things I wanted when writing validators, was to add metadata to make the validating process more feasible/robust/precise. I think back there you or someone else told me it would make the code too complicated or something like that. Now if there was an event api that acts as the core of the parser, on which we put can put the ast-builder and other things independently, I think using metadata in the validation process would be acceptable. It would simplify making the debugger-friendly parser generator. (I imagine that you thought of this things or similar ones when you introduced the idea of an event based parser.)

All that to say it might be better to start thinking about that event based api and then come back on the tooling.

tomtau commented 4 days ago

Yes, I didn't have the validation in mind, but in theory, separating AST generation from the parsing process should make it easier to have "pluggable" ASTs for different use cases.