ridencww / goldengine

Java implementation of Devin Cook's GOLD Parser engine
Other
35 stars 14 forks source link

Seems like a serious problem #5

Closed nimatrueway closed 10 years ago

nimatrueway commented 12 years ago

I was writing a simplified version of c grammar for my program, and i tested it with a couple of lines of codes in Golden Parser Generator, it was correct and worked well.

But when i took it in my program and tried to run it by ridencww-engine it didn't "reduce" some tokens well and came up with some errors.

I attached my Grammar and my Test-file so you can analyze it in your engine.

http://www.mediafire.com/download.php?sd4vs0sdcplau1a

Unfortunately i'm running out of time and i need to deliver program to my client soon so i have to use another engine to make it work right now, but i like your engine very much and i want to use it in my further projects.

Thank you.

ridencww commented 12 years ago

There is a BOM (Byte-order mark) at the beginning of Test.txt that is causing a problem. The engine should (and will) handle this, but if you remove the BOM, it seems to parse. Would you like me to zip up my test files (your grammar and all of the rule handlers broken out) and send them over to you?

nimatrueway commented 12 years ago

I removed it, but it doesn't work that way either. The problem is actually about core parsing engine. And about the instance that i shared, specifically it is about parsing "length(A)-1" part of it.

int i,j; for (i=length(A)-1; i>=0; i++) for (j=0; j<i; i++)

--->

When you remove "-1" part from "length(A)-1", parsing process will be alright then.

int i,j; for (i=length(A); i>=0; i++) for (j=0; j<i; i++)

--->

but Golden Parser - Test Engine can parse both of test instances with no problem, there must be a critical bug in the engine.

ridencww commented 12 years ago

Now that I have the parse tree I can walk through the productions line by line paying attention to the length(A)-1 section. Assuming that the EGT is built correctly (likely since it works in the GOLD Builder), it should be fairly easy to find out what is happening. Let me have another look later today.

nimatrueway commented 12 years ago

Aha, I finally figured what the hell is all about..

In my instance, there is a rule :

        "<Op Add> ::= <Op Add> '-' <Op Unary>"

And in "GOLDParser.java" > "createInstance(String ruleName)" we have :

        clazz = ruleHandlers.get(ruleName.replace(', )); // Try removing single quotes

So the engine is looking for

        "<Op Add> ::= <Op Add> - <Op Unary>"

meanwhile in "ruleHandlers" MapVar there is actually a map for string :

        "<Op Add> ::= <Op Add> '-' <Op Unary>"

...

My temporary solution is :

in GOLDParser class, in function "loadRuleHandlers(String packageName)" we replace the line

                    for (String rule : a.rule()) {
                        ruleHandlers.put(rule, clazz);
                    }

with

                    for (String rule : a.rule()) {
                        ruleHandlers.put(rule.replace("'", ""), clazz);
                    }
ridencww commented 12 years ago

The code first tries to find the mapping with the rule name as given. Only if the key wasn't found will the single quotes be removed. I seem to remember some quirk in the Map/TreeMap implementation that required me to do this. However, since this approach seems to be inadequate, I will revisit the problem.

BTW, as you have probably discovered, the rule handlers are only created when you don't generate an output tree. Another thing to keep in mind is that if you wish the template provided in the GOLD Builder, the code generates rule handlers with a parameter validater expecting 1..2 parameters as an example. I should have commented those lines out and allowed the language developer to enable parameter checking when needed.

ridencww commented 12 years ago

Map/TreeMap do work as expected and I was able to retrieve the map value with a key containing single quotes. There was another reason I did this originally. In any case, the rule handler instance should have been able to be retrieved using the lookup just before the one that removed the single quotes.

ridencww commented 12 years ago

Thank goodness for tests. The stripping of the single quotes was added because some rule handler annotations omitted the single quotes even though the production clearly had them. Looking up the rule without the single quotes if the rule wasn't found with the quotes was a quick, albeit short sighted, workaround. Before I make any changes, I want understand why the annotations have the quotes stripped. I will also enter an issue with the BOM on the source file.

nimatrueway commented 12 years ago

I believe they key is in "converting a production to string" which comes from attaching a chain of strings derives from Symbols on SymbolList of the right-side of the production.

And when symbols want to turn to string they use this function ( when they're CONTENT type )

"literalFormat(name, alwaysDelimitTerminals)"

which when the function finds Letter / "-" / "." / "_" in the source:string argument, it avoids using singlequote delimiters around it.

nimatrueway commented 12 years ago

I checked a bunch of terminals involving ( Numbers / "." / "-" / "_" / Letter ) and then used skeleton generator, And i end up concluding that we should use single-quote delimiters for ALL strings, except strings that consist of just Letters.

Am i right ?

ridencww commented 12 years ago

The engine imposes few, if any, restrictions on the terminals. What has to happen is that the handler's annotation value must agree with the production/reduction string value. I think the only character that needs special attention is the double quote character, which has to be escaped in the Java code. I have observed in my tests that the skeleton code can generate annotations without the single quotes, which the engine attempts to look up the handler WITH the quotes and won't find it because the key, built from the annotation doesn't have the quotes.

I think I am close to a proper resolution, but want the sleep on it -- it's been a long day anyway.

nimatrueway commented 12 years ago

Ralph, direct String>Class mapping is kind of inefficient and comes with a lot of problems that we just encountered,

What do you think.. if we first process "annotation rules", split them by symbols, and make a Production out of them. then we use a Production>Class for rulehandlers instead.

This way it would be more efficient, and also there will be no problem about quotes, doublequotes and stuffs.

ridencww commented 12 years ago

I do want to review the overall construction of the engine and refactor where it makes sense. As the engine gets used by more and more people, there will be opportunities to identify areas of improvement. If the engine is working for you with your modification and you are able to move forward, I'd be interested in more of your feedback as you use the engine. Lacking a general forum for such discussions, you can send email to riden at creativewidgetworks.com

ridencww commented 10 years ago

Addressed in 5.0.3-SNAPSHOT and 5.0.3 release. Thank you for the defect report and the suggestion for better rule handler matching. I've moved your suggestion to the Wiki.