kschiess / parslet

A small PEG based parser library. See the Hacking page in the Wiki as well.
kschiess.github.com/parslet
MIT License
805 stars 95 forks source link

Detect infinite loop and raise exception #108

Closed NigelThorne closed 9 years ago

NigelThorne commented 10 years ago

This seems to work well. I don't know how much unit testing etc. you would like me to do to show this works. .. but I used it to debug the ruby-in-ruby\crystal failing_tests we were discussing on the mailing list.

I added a helper to Position to let me output the preceding 20 characters. I'm sure you have a better way to show parser context when throwing exception, but this was useful for me.

NigelThorne commented 10 years ago

So the logic is that when repeating, should you perform a match that consumes no characters, nothing stops that same match a second tine . if this repetition is unbounded (no Max set) then this will loop for ever.

I chose to this an exception as this is not a problem with the match as such, but with the grammer as implemented.

Any suggestions for improvements?

NigelThorne commented 10 years ago

s/this/throw

rubydesign commented 10 years ago

Hi Kasper, are you going to merge this ? Seems very useful and fixes my bug.

T.

kschiess commented 10 years ago

This is in the hot spot of the parser and will make all parsers slower. Is that worth it? Correctly constructed parsers will never exhibit this. Can you somehow monkey-patch this in inside a require so that not everyone would have to use it?

My name is Kaspar btw.

rubydesign commented 10 years ago

Hi Kaspar (sorry about that),

how about putting it into the parse_with_debug (if that was the correct name) ?

I understand the point about correct parsers, but we are all learning before we are masters and that makes for a world off imperfectly written parsers (just as it makes for imperfect software)

I mean the functionality be helpful, not just for me (though to answer the question, off course i will monkey patch if you don't include it), and if you say it is costly then debug seems the perfect place.

What say you ?

NigelThorne commented 10 years ago

It would be ideal if we could swap it in/out then.. a fast parser for production and a slower one while developing the grammar.

It reminds me of the different requirements for error reporting during grammar development vs production. (ie. tell my users how their document is wrong vs. tell me how my grammar failed to parse my document).

Cheers Nigel


"Man, I'm going to have so many chickens when this lot hatch!"

On Fri, Aug 1, 2014 at 3:59 AM, Torsten Rüger notifications@github.com wrote:

Hi Kaspar (sorry about that),

how about putting it into the parse_with_debug (if that was the correct name) ?

I understand the point about correct parsers, but we are all learning before we are masters and that makes for a world off imperfectly written parsers (just as it makes for imperfect software)

I mean the functionality be helpful, not just for me (though to answer the question, off course i will monkey patch if you don't include it), and if you say it is costly then debug seems the perfect place.

What say you ?

— Reply to this email directly or view it on GitHub https://github.com/kschiess/parslet/pull/108#issuecomment-50795188.

kschiess commented 10 years ago

Nigel - I agree - we'd need this to be swappable. Also: When your parser hangs, you always have a recursive loop, no exception - it cannot hang for other reasons. So we already have a kind of detection.

This seems to happen to people that are new to PEG a lot. Other parser engines do magic stuff that blurs the line between code and parsing - PEG parsers really just do top down method calls and that should be the way to think about things.

If someone manages to write an extension to parslet that can be turned off (speed-wise) - I'll merge. Otherwise this is kind of a low priority - if you test your parser atoms in isolation, you'll know where the problem is... (can't say this enough).

NigelThorne commented 10 years ago

agreed.. Testing solves most problems. The nice thing about the loop detection is that it clearly shows you how it managed to loop. This apparently isn't clear to new users.

Another way to make this clear is to change 'repeat' to default to 1 not 0 occurrences (as this is the main mistake that gets made). On 19 Aug 2014 17:01, "Kaspar Schiess" notifications@github.com wrote:

Nigel - I agree - we'd need this to be swappable. Also: When your parser hangs, you always have a recursive loop, no exception - it cannot hang for other reasons. So we already have a kind of detection.

This seems to happen to people that are new to PEG a lot. Other parser engines do magic stuff that blurs the line between code and parsing - PEG parsers really just do top down method calls and that should be the way to think about things.

If someone manages to write an extension to parslet that can be turned off (speed-wise) - I'll merge. Otherwise this is kind of a low priority - if you test your parser atoms in isolation, you'll know where the problem is... (can't say this enough).

— Reply to this email directly or view it on GitHub https://github.com/kschiess/parslet/pull/108#issuecomment-52597151.

rubydesign commented 10 years ago

I'd love to be able to fix this, but as a beginner i can only stress the importance.

Kaspar's comment about always hanging in infinite loops makes it sound like that were a good thing: but they are the worst bugs to track, especially as a beginner. Usually ruby does not give relevant stack points when a StackTooDeep exception occurs.

And even i have quite many tests, it took me a while to find the bug, because testing in separation is not so easy with non-trivial grammars. Especially with above point that the error message is just in the wild.

I think even limiting any repeats to 100 (configurable if must be) for the debug parse and throwing the exception from parslet code would help the situation at lot.

So please don't think it's low priority only because you understand the problem and can avoid it.

NigelThorne commented 10 years ago

How could I measure the performance hit of this check? I understand that it's in a performance critical section of code. Do you have a standard grammar and document you use for this purpose?

Assuming the performance hit is unacceptable... as it stands I would have to monkey_patch the whole method... which means duplicate its content... which is possible but not ideal.

Maybe we add a debug_try method to repetition... and have it run through the same unit tests as repetition#try plus one for looping.Then monkey patching is just swapping the methods.

Another approach would be to implement a monkey patching method using ruby_parser to convert the repetition#try method in memory into sexpressions, then inserting the required checks, then converting it back to replace the original method. This sounds brittle too though (but might be fun).

kschiess commented 10 years ago

For performance comparisons, I've been cooking up some ad-hoc scripts in https://github.com/kschiess/parslet-benchmarks.

Code duplication in general is bad, I agree - but in this case, we might just accept it. The code duplicated is small, stable and easily maintained. General principles should not stop us from working...

We have to avoid confusion though: This patch doesn't avoid the StackOverflowErrors, just the local endless loops. To avoid the StackOverflowErrors, we'd need loop detection across the grammar tree (a calling b calling c calling .. a - on the same charpos). As in real live, getting rid of whiles that don't terminate is easy, getting rid of recursion errors is harder.

kschiess commented 9 years ago

Seems no one is working on this anymore.. can we close this thread down?

NigelThorne commented 9 years ago

I'll try to work on this .. this week. If it's not done by Friday, close it. Cheers Nigel

On Mon Feb 16 2015 at 20:57:14 Kaspar Schiess notifications@github.com wrote:

Seems no one is working on this anymore.. can we close this thread down?

— Reply to this email directly or view it on GitHub https://github.com/kschiess/parslet/pull/108#issuecomment-74483639.