Open kajarigd opened 8 years ago
thanks for your detailed message. I'm looking into the issue as soon as possible. --Matthias
Thanks a lot! We are running a detailed batch test on both types of grammars I mentioned above. I can share the error reports with you in your email, if you would like to take a look at them.
Hello! I was going through the source code of node-xsyn to track down the cause for the errors I mentioned in this ticket. Below I am listing down my findings and the patch I have implemented:
A. Making the recursion call tail optimized by trampolining the doParse() function: Current doParse(pstate,level) fails due to deep recursion as by default V8 Engine doesn't support tail optimized recursion. Trampolining doParse() completely eliminates RangeError, since it uses loop to simulate the recursion and doesn't pile up on the stack. I have tried it and it is working fine. Now, huge files (more than 2k lines and files with huge amount of code requiring deep rule shifts) are getting parsed successfully without throwing any error.
Code snippet:
/**
* @method doParse(pstate,level)
* @returns void
*/
GrammarDef.prototype.doParse = function(pstate, level) {
try{
function doParseIn(pstate,level) {
//original doParse(pstate,level) as a closure
}
return trampoline(doParseIn.bind(null, pstate, level));
} catch (e){
GrammarUtils.debug('error: ' + e + '\n' + e.stack);
//throw e;
}
}
function trampoline(f) {
GrammarUtils.debug('recurstion starts ');
try{
while (f && f instanceof Function) {
f = f();
}
} catch (e){
GrammarUtils.debug('trampoline catch ');
GrammarUtils.debug('error: ' + e + '\n' + e.stack);
//throw e;
}
GrammarUtils.debug('recurstion ends ');
return f;
}
B. One more part is yet to be added to the above patch. At present it cannot handle the try/catch in the original doParseIn(pstate,level) as-is. Since, exception thrown in the original doParse(pstate,level) is custom SyntaxError, which triggers the recursion loop to go back one level, trampolining won't support it as-is. Hence, at present when a SyntaxError is thrown, it is directly caught in the catch block of trampoline(f). We can do different things in this catch block now - (a) It can be re-thrown to go back to the new doParse() wrapper's catch block and we manage the backtracking there (b) modify doParseIn(pstate,level) to keep a track of SyntaxError's trigger to walk up one step (using counter/flag, and save/keep the other data values). When an exception is thrown, inside trampoline's catch block call f() again (this f is now the f one step up), and pass on the SyntaxError information, so that it does the required backtracking this time.
I am writing code for this part right now. Will keep you posted.
sorry for the late response... thanks for putting so much work into analyzing this. Did you see any improvements since you patched the code? --Matthias
On Sun, Jan 31, 2016 at 2:35 AM, kajarigd notifications@github.com wrote:
Hello! I was going through the source code of node-xsyn to track down the cause for the errors I mentioned in this ticket. Below I am listing down my findings and the patch I have implemented: The cause of the error:
1.
The root error is getting generated from the method doParse(pstate,level) in the file xsyn.grammar.impl.GrammarDef.js. The error message is, "RangeError: Maximum call stack size exceeded". When the recursion level of doParse() is exceeding 8924 (Nodejs v4.2.6), this recursion stack overflow error is getting thrown. This is because doParse() is not tail-optimized and V8 Engine bundled with Nodejs v4.2.6 doesn't support tail optimization for recursion by default. 2.
doParse() catches this RangeError and starts doing backtracking in its catch block thinking SyntaxError has been thrown. The thrown exception essentially brings the control back to one level up in the recursion and calls action.undo() in the catch block to backtrack. But, since the error this time is RangeError and not SyntaxError, action.undo() is leading to unexpected results like : stack underflow (pop), syntax error at wrong places.
The patch I have added in my local:
A. Making the recursion call tail optimized by trampolining the doParse() function: Current doParse(pstate,level) fails due to deep recursion as by default V8 Engine doesn't support tail optimized recursion. Trampolining doParse() completely eliminates RangeError, since it uses loop to simulate the recursion and doesn't pile up on the stack. I have tried it and it is working fine. Now, huge files (more than 2k lines and files with huge amount of code requiring deep rule shifts) are getting parsed successfully without throwing any error.
Code snippet:
/**
- @method doParse(pstate,level)
- @returns void */ GrammarDef.prototype.doParse = function(pstate, level) { try{ function doParseIn(pstate,level) { //original doParse(pstate,level) as a closure } return trampoline(doParseIn.bind(null, pstate, level)); } catch (e){ GrammarUtils.debug('error: ' + e + '\n' + e.stack); //throw e; }
}
function trampoline(f) { GrammarUtils.debug('recurstion starts '); try{ while (f && f instanceof Function) { f = f(); } } catch (e){ GrammarUtils.debug('trampoline catch '); GrammarUtils.debug('error: ' + e + '\n' + e.stack); //throw e; } GrammarUtils.debug('recurstion ends '); return f; }
B. One more part is yet to be added to the above patch. At present it cannot handle the try/catch in the original doParseIn(pstate,level) as-is. Since, exception thrown in the original doParse(pstate,level) is custom SyntaxError, which triggers the recursion loop to go back one level, trampolining won't support it as-is. Hence, at present when a SyntaxError is thrown, it is directly caught in the catch block of trampoline(f). We can do different things in this catch block now - (a) It can be re-thrown to go back to the new doParse() wrapper's catch block and we manage the backtracking there (b) modify doParseIn(pstate,level) to keep a track of SyntaxError's trigger to walk up one step (using counter/flag, and save/keep the other data values). When an exception is thrown, inside trampoline's catch block call f() again (this f is now the f one step up), and pass on the SyntaxError information, so that it does the required backtracking this time.
I am writing code for this part right now. Will keep you posted.
— Reply to this email directly or view it on GitHub https://github.com/mageran/node-xsyn/issues/9#issuecomment-177461307.
Hello! After adding the trampoline wrapper, I am not getting the recursion stack overflow issue anymore. Hence, scripts with more than 1K lines of code and with deep hops are getting parsed correctly without any issues now. These are scripts, which doesn't need backtracking.
Now to support the backtracking, we cannot use the catch block inside the doParse() anymore. Since trampoline flattens out the recursion into a while loop, we cannot use any catch block or any other function to 'go back' to the previous recursion states. Hence, a separate logic is needed to store the previous parse states to support backtracking. I have done with storing one previous state and I checked the code is working fine.
Hello! It seems we are unable to optimize our grammar to win both in terms of parsing time and removal of look-ahead issues/unnecessary rule matches. Here are two scenarios we are facing:
(1) Rules have lots of look-ahead issues, but minimal hops are needed to match a rule. e.g.consider the following 3 rules. These are not real ones, but used just to illustrate the issue: (a) expr: string | number | var (b) var : var_type_1 | vary_type2 | var_type3 (c) stmnt_type1 : if '( var '')' then
In this above type of situation, we are getting issues like stack overflow/underflow for very large input files. Also, if the rule stmnt_type1 is repeated multiple times, the last occurrence is giving some symbol/keyword error.
(2) The same grammar rules if we write in the below way we are not facing any issues: Rules are: (a) expr: string | number | var_type_1 | vary_type2 | var_type3 (b) stmnt_type1 : if '( expr '')' then. So, if we reduce the number of hops till rule reduction starts we are not getting any of the above errors. In this scenario parsing is taking 4-5 mins. But in the first scenario above parsing is relatively faster, in the order of 1-2 mins.
Our grammar has lots of rules. I just gave you a hypothetical example above to illustrate the issue we are facing. Also, In scenario 2 since we are reducing hops, we are letting some extra look ahead issues creep in, which is also making the parsing very slow.
Can you please tell me if my understanding of the reasons for the errors is correct or not? If not, can you please tell what can be the probable reason for such failures? If my understanding is correct, is there any smart way we can handle this?