Debugging: How, what, why?

zesterer / chumsky

Write expressive, high-performance parsers with ease.

https://crates.io/crates/chumsky

MIT License

3.64k stars 155 forks source link

Debugging: How, what, why? #19

Open zesterer opened 3 years ago

zesterer commented 3 years ago

Chumsky currently supports a primitive debugging system, allowing parsers to print to stdout when entered during a call to Parser::parse_recovery_verbose. Expanding this further will require some thought.

1) What problems should debugging attempt to solve?

Parsers that consume zero input and repeat
Paths erroneously taken
Priority errors (i.e: a.or(b) vs b.or(a))

2) What information needs to be shown to the user?

Entered parsers
Number of iterations
Source location of parser
Recursion points

3) How is best to show this information?

Annotated tree?

4) What API features should be supported?

Recursion limit to prevent stack overflows

natemartinsf commented 3 years ago

One debug feature I would find really useful is a way to print out the nested tuples that are the outputs of .map and .map_with_span.

Many of the parsers I'm writing end up having the data I need to build the AST buried several layers deep in nested tuples. Sometimes you can figure out the structure by looking at the combinators you used to build the parser, but other times the only way I was able to figure it out was through several guess -> compile -> error cycles until the data types lined up.

So a debug_print function I could drop in a map to print out the nested data structure would be really great.

zesterer commented 3 years ago

Debug-printing the output seems like a good idea, yes. Perhaps also the input too? It would be amazing to be able to generate a mapping between the two, a diagram that explains exactly which parts of the input get processed by specific parsers and shows the output AST that gets generated. I'm thinking something like this:

Input `x + y`
    => ...processed by the parser at line 37 in `parser.rs`..
    => ...generated output `Expr::Binary(BinaryOp::Add, Expr::Local("x"), Expr::Local("x"))`

What I'm wondering is how to organise this output such that it doesn't become too verbose to be useful. It's almost like it requires a flamegraph-esque SVG that can be navigated around or something.

natemartinsf commented 3 years ago

Along these same lines, a debugger tool that lets a user figure out why they are overflowing the stack would be great!

(Mentioning this because it's happening to me right now, and the "debug" parser doesn't print out if the stack overflows.)

zesterer commented 3 years ago

That's a good use-case. Perhaps I should also add a recursion limit to prevent this sort of thing.

Person-93 commented 3 years ago

Does the debugging have to be through normal CLI output? Perhaps add a feature flag that enables a GUI that lets you see the output and step through the parsers. I've tried stepping though it with a normal debugger and didn't find it very helpful.

zesterer commented 3 years ago

I'm still a little unsure about how best to output this information. CLI is definitely the most universal, but is not particularly easy to explore.

taka231 commented 2 years ago

Hello! How about improving the debug method we have now to output the input to the parser and its consumption, I'm thinking of something like the dbg function in Megaparsec in Haskell. Here's an example. https://markkarpov.com/tutorial/megaparsec.html#debugging-parsers

zesterer commented 2 years ago

I'm increasingly of the view that this should be implemented as an extension trait on top of existing combinators rather than embedded into the crate as with master. Perhaps this will be the way forwards in zero-copy.

zesterer commented 1 year ago

This is related to (but not the same as) #280.

jyn514 commented 1 month ago

zesterer and i chatted today about how the new Inspector API in https://github.com/zesterer/chumsky/pull/681 might work really well for this - an extension could track rewinds and include them in a custom Error type so you know all the alternatives that were considered. he suggested it could even use Location::caller to point to the exact parts of the source that failed to parse.