mna / pigeon

Command pigeon generates parsers in Go from a PEG grammar.
BSD 3-Clause "New" or "Revised" License
835 stars 66 forks source link

Debugging? #97

Closed kpixley closed 2 years ago

kpixley commented 4 years ago

Is there a description or a walkthrough of the output of the Debug() option anywhere?

I'm going crazy trying to understand what this thing is doing. I really miss the shift/reduce tables from yacc. They were tedious but the inevitably led to the problem. With pigeon, I'm doing a lot of guessing and a very large number of fmt.Fprintf's trying to guess and weed out possibilities for where the problem might lie.

Today I'm going to have to start creating secondary test grammers in an attempt to try to isolate the problem. Some debug help would certainly be useful.

breml commented 4 years ago

@kpixley Sorry for my slow response, I was offline the last few days. I am not familiar with the shift/reduce tables from yacc, so I am not really sure, what exactly you are looking for.

If the Debug(true) option is set, the parser prints each step of the parsing. This output is very verbose and I find it only rarely useful. On each line you find the following information:

When ever I work on a new PEG grammar, I often use fmt.Print as well to get a better understanding, what the parser is doing or why the grammer does not work as intended.

Maybe you can share an example of the grammar you have problems with so I might be able to help you with this.

kpixley commented 4 years ago

I seem to be having trouble reading that output.

The ostensible input character stays the same for long periods of time then jumps and I don't understand why.

The MATCH lines are confusing in that they show me neither c.text that has been matched nor what rule was involved. Or maybe I just don't understand what it is showing me.

And the function entry/exits don't tell me what rule is being considered so they're extremely difficult to follow. You're parsing what now? Maybe this? Except that you're backtracking in two steps which makes no sense... shake head, go back to printfs.

The shift/reduce tables from yacc basically say, given a particular input token, do we read more, (shift)? Or "reduce" by popping lower level tokens off the stack and pushing back a higher one and which rule was applied to do so. Using them I can follow along and "play computer" until I see where my parser is doing something unexpected. I can, in a moment, determine both the stack and therefor the state of the parser as well as what will happen next step. It's tedious but like single stepping through a debugger it's pretty much guaranteed to find the problem eventually and the time to do so is relatively predictable.

I understand that the parsing techniques and grammars are different with PEGs but I still want to know where in the input stream is being considered and what rule is it being considered for. I have neither of those from the pigeon generated parser.

My worst, (most time consuming), errors fall into two categories. Either nothing is being matched and I have no idea why or the wrong thing is being matched and I have no idea why. The time involved is spent guessing, reformulating the grammar in order to inject log/printf lines, which may or may not help, and then repeating. This is essentially random and not guaranteed to ever find the problem, much less in a predictable amount of time, both of which are problematic and frustrating.

In the "no match" case, I wish I knew which rules were being considered and where we were in the in the input stream.

In the "wrong match" case, I think nearly all instances for me turn out to be surprising, buried instances of unexpected ordering, often a greedy match that takes precedence over an earlier alternative.

I guess what I want first is some indication of rule, alternative, and term being considered in the debug trace. That seems to be the hardest to puzzle through. Maybe if I had that the current scan position would make more sense to me.

A second would be a list of rules/terms being considered on a "no match" as well as a statement of where we were in the stack at the time. Knowing which chars are valid next is helpful but it doesn't explain why '%' isn't one of them when I think I've clearly written a rule in which it is.

I am getting through my problems. Eventually. I'm not getting stuck so even if I sent you today's problem, I'll probably have it figured out by tomorrow. Pigeon has been completely reliable for me so far. I suspect it frequently but I eventually track the problem down to my grammar. But problems that should, IMO from yacc/byacc/bison experience take a few minutes often take me hours here.

Pigeon is a fine tool. And it could be better.

breml commented 4 years ago

Thanks for your long answer. I am not really sure, how I can support you or what exactly could be improved in pigeon. On think that maybe could help you is to add state change code blocks in your grammar to print out details that could help you debug. E.g.

#{ fmt.Println("passed here"); return nil }

Again, I think for the problems you are facing, the Debug(true) option is not very helpful, so I would not recommend to use it. I think it is more useful, to debug pigeon it self (and not a grammar processed by pigeon).

I propose to close this issue as well. Please feel free to open new issues with specific requests for improvements.

kpixley commented 4 years ago

I think this one has specific suggestions. If you need me to copy them to a new ticket I can, although I don't see the point.

State change debug lines don't really help much. I'm doing something similar now but without more info about what the parser is doing it's a lot of groping around in the dark. I write at least 20-30 lies that are never executed and spend a lot of time wondering why before I get one that prints.