Open peter-leonov opened 11 years ago
The first blabla chunk is an inline action block: yacc/bison supports this syntax = way to mix action code and grammar rule tokens, while jison does not. (I guess the grammar was copypasted from a yacc sample somewhere.)
To make it work n jison you will have to split the rule manually so that you only have action blocks at the END of each rule. Thus:
Topstmt: stmt | beginkeyword '{' compstmt '}' { blabla2 action block } ;
Beginkeyword : keyword_BEGIN { blabla1 action block } ;
Sorry for format mess, typed this on a new mobile but the point should come across nevertheless, I hope. On May 27, 2013 1:02 PM, "Peter Leonov" notifications@github.com wrote:
In the ruby language parser there is a strange construction:
top_stmt : stmt | keyword_BEGIN { // bla-bla-bla } '{' top_compstmt '}' { // bla-bla-bla };
which produce a syntax error:
/usr/local/share/npm/lib/node_modules/jison/lib/cli.js:77 throw e; ^ Error: Parse error on line 174: ... { } '{' top_compstmt '}' ---------------------^ Expecting ';', '|', got 'STRING'
Help me handle this kind of expressions, please :)
— Reply to this email directly or view it on GitHubhttps://github.com/zaach/jison/issues/173 .
Thanks you for the fast answer. This was a show stopper.
Yes, the example was indeed copied from the original ruby parser written for bison. And, as far as there are hundreds of such statements I'll go nuts if try to convert all of them to the smiler END rules. And more on this, nobody can support this type of port :(
So, there are two ways for porting the ruby parser: — wait for Jison to support In-Mid actions — hack the original Bison to produce JSON output and handle it in JS world somehow
What do you think of it?
P.S. Thank you twice :)
Replaced all the mid-rule action with a stub token. Got that giant grammar compiled in 30 seconds, but Jison output a thousand of conflict warnings. The same grammar has been compiled OK by Bison.
Here is the grammar: https://gist.github.com/kung-fu-tzu/5659189
You're welcome!
In answer to your questions and remarks:
1) I'm sorry, but jison is /not/ a /port/ of bison but a lexer/parser created from scratch, which happens to be more or less compatible with bison. Note the 'less' as it cuts both ways:
1.a) jison combines a lexer and LALR(1) parser in one package. Which is jargon which can be loosely translated to this: 'jison' includes both a lexer (regex-based and a slightly simpler variant of the old-skool tools lex/flex, known to C/C++ programmers ;-) ) and a parser generator and only requires a single .jison definition file. Contrast this with the GNU answer to lex/yacc: flex/bison, which requires TWO definition files for the same thing: one for flex (look for .l files in your C/C++ projects), which is fed to flex to munge input TEXT and spit out TOKENS (integers = IDs) and another one for bison/yacc: the .y file. Jison combines a simile of these in a single definition file (*.jison) so that you have everything that defines the language in a single file.
This is a feature that bison doesn't have, at least not when I last looked at that one a few years ago (I myself 'migrated' from yacc>btyacc/bison>pccts>antlr and then to jison when the need arose to do the same sort of thing in JavaScript. I still like the way antlr does parse error handling and recovery best among all of these but the JS 'target' of antlr was not working for me while I needed something familiar and get me some JS out of there, so there I am, using jison and going old-skool again.)
1.b) jison may share 4 alphanumerics with bison and may support a large subset of the bison vocabulary but it does not, at least not YET, include support for GLR(1) / LR(1) grammars. Check wikipedia and bison manuals if these acro's don't say anything to you right now. Jison currently only 'does' LALR(1) which is equivalent to the abilities of classic yacc. Jison does support a large subset(!) of the bison dialect/notation, but not everything. (The main exercise I believe was creating a parser /generator/ in JavaScript and jison passed on that subject with flying colours, IMHO. The fact it happens to be quite similar to bison is a bonus. Unless its author (@zaach) wishes to correct me on this notion. ;-) ) The point being: when that ruby grammar is LR but NOT LALR, then, yes, expect lots of conflicts, both shift/reduce and reduce/reduce. If I recall my theory correctly, any LR grammar can be turned in a LALR one, but you will have to do that by hand and the result won't necessarily be 'nice' and 'human readable' if you get my drift.
Check whether you have any reduce/reduce conflicts and see if you can get rid of them by rewriting the relevant parts of the grammar to make it fit the LALR(1) principles. (Being fluent in LR parser theory would help a lot; otherwise this can turn into an agonizing exercise; we'll get to that at the bottom of this message...) If you cannot get rid of the reduce/reduce conflicts, then you have a few alternatives, which leads us to the answer regarding porting the ruby grammar:
2) porting the ruby grammar:
We assume, without having checked but following the signals received so far (including a swift google sweep: http://blog.nicksieger.com/articles/2006/10/27/visualization-of-rubys-grammar/
NOTE THAT I AM NOT REFERRING TO THE INLINE ACTION BLOCKS PROBLEM HERE ( http://www.gnu.org/software/bison/manual/bison.html#Mid_002dRule-Actions ); THIS IS ABOUT PARSER THEORY/TECHNOLOGY ITSELF.
2.a) Note the first link: http://blog.nicksieger.com/articles/2006/10/27/visualization-of-rubys-grammar/
That's a hint. ;-)
If you're not yet fluent in LR parser theory, you may have a Long Walk ahead of you.
2.b) When you follow those links above, you'll see a few things. Also, there's https://github.com/ruby/ruby/blob/trunk/tool/id2token.rb which is part of that original ruby build action at https://github.com/ruby/ruby/blob/trunk/common.mk#L587 while full awareness of the parser generator process in general and bison in particular should have led you on an immediate hunt for lex/flex, which started here for me, with some surprise: https://github.com/ruby/ruby/blob/trunk/Makefile.in#L296 and seeing gperf mentioned in there, I knew this wasn't all, so more lex hunting gave up this, as the above, particularly https://github.com/ruby/ruby/blob/trunk/common.mk#L587, hints at a 'custom lexer' (you do not HAVE to use flex ;-) ): google rb_reserved_word (from gperf output) --> http://cakebox.homeunix.net/doc/ruby19/html/d3/deb/lex_8c.html#a7eb7a0feb2cca5ef92883d90094b46f8 --> wake up and forget flex, look for the yyparse() and consequently the yylex() --> http://cakebox.homeunix.net/doc/ruby19/html/d5/d11/ripper_8c_source.html#l04849 as we look for yylex() and find if via the defines and Ctrl+F: parser_yylex() is a custom lexer and I should have started with looking in the post action chunk of parse.y anyway: https://github.com/ruby/ruby/blob/trunk/parse.y#L6852 ta!-da! custom lexer, using gperf assistance.
Time spent so far: ~ 20 minutes.
I did this to check if bison was called with special arguments (it wasn't) and to see if the original grammar had bison-specific GLR(1) instructions included: it does not. I continued for a bit, thus delivering full code-level analysis of which places are important when you want to port the ruby grammar for real, as that would require both lexer and parser ported over to JavaScript, both of which can be done, but are a major task. The goal of this is simple: if you were not able to follow along smoothly, combined with the initial problem you posted, this is a strong indication that you have no solid previous experience in the referred language theory and original tools which were used to create ruby; this means that the goal of porting ruby to JavaScript would be a many-hurdle learning experience for you, both in C programming (as you'll have to be able to 'decode' what the ruby people wrote and did, particularly where they went off-mainstream, such as their lexer) and in jison. Also, lines like %token tCMP RUBY_TOKEN_CMP "<=>" will 'compile' in jison but will not do what you may expect as each 'token' listed after %token will be treated as an new individual, see for example https://github.com/zaach/jison/blob/master/examples/ansic.jison - thus tCMP and RUBY_TOKEN_CMP and "<=>" and will each be treated as 'a new token', thus producing 3 token IDs instead of the single one I believe you wanted. Instead, such a line should end up in the lexer section, where the "<=>" string would then be 'lexed' into token tCMP which is used in the grammar rules. (Yup, I saw your gist specifically states it's 'meant to be broken' ;-) )
Anyway,
So, there are two ways for porting the ruby parser: — wait for Jison to support In-Mid actions — hack the original Bison to produce JSON output and handle it in JS world somehow
first is possible, if Zaach has enough time available to do this Real Soon Now (you'll have to ask him!) but second is a 'not useful' item IFF you mean you mean compiling the ruby grammar in bison and having bison spit out JSON for /jison/ as that would have you run into the same LALR trouble unless you tweak bison such that it does maybe a 'grammar rewrite' which would be a notable effort -- I'd rather suggest getting bison to produce the generated parser in JavaScript, i.e. add a JavaScript 'target' to bison
A viable 'one off' alternative would be to take the bison generated C parser and port that code (including tables) over to JavaScript, port the custom lexer over to JavaScript as well and then take it from there. One of the minor headaches then would be to match those case numbers inside the original output to human-comprehensible action block positions in the original grammar, but bison itself can help you there as it provides debug info upon request.
If the blurb after "Time spent so far: ~ 20 minutes" didn't scare you, then you may have a chance at succeeding. Otherwise, if you want to pick up this kind of technology, then start with something simpler to get a handle on jison and its details, and possibly, if you haven't already, work and dig through bison as well. When you do, you may find the mentioned 'M4/skeleton for JavaScript target' a viable option and other people would be grateful for that output, I'm sure! :-)
Met vriendelijke groeten / Best regards,
Ger Hobbelt
web: http://www.hobbelt.com/ http://www.hebbut.net/ mail: ger@hobbelt.com
On Mon, May 27, 2013 at 2:05 PM, Peter Leonov notifications@github.comwrote:
Thanks you for the fast answer. This was a show stopper.
Yes, the example was indeed copied from the original ruby parser written for bison. And, as far as there are hundreds of such statements I'll go nuts if try to convert all of them to the smiler END rules. And more on this, nobody can support this type of port :(
So, there are two ways for porting the ruby parser: — wait for Jison to support In-Mid actions — hack the original Bison to produce JSON output and handle it in JS world somehow
What do you think of it?
P.S. Thank you twice :)
— Reply to this email directly or view it on GitHubhttps://github.com/zaach/jison/issues/173#issuecomment-18495310 .
This answer is what I call HELP. With all capital letters :) Re-reading it for the third time.
Readable version: https://gist.github.com/kung-fu-tzu/5686206 :)))
You're welcome!
In answer to your questions and remarks:
1) I'm sorry, but jison is /not/ a /port/ of bison… Understand. Who may want to just copy such a monster.
1.a) jison combines a lexer and LALR(1) parser in one package. It is cool and I like it. But, as you write below, in my case the builtin lexer is not the case :(
1.b) jison may share 4 alphanumerics with bison and may support a large
subset of the bison vocabulary but it does not, at least not YET, include
support for GLR(1) / LR(1) grammars. Check wikipedia and bison manuals if
these acro's don't say anything to you right now. Jison currently only
'does' LALR(1) which is equivalent to the abilities of classic yacc.
OK, before given investigation, I thought that LR and LALR is all about the resulting parser, the algorithm of evaluating the grammar. Now I realize that the whole grammar, states optimization and the actual parser code is different for those type of parsers. May be not opposite different but the difference is significant in context of the Ruby parser problem.
…
The point being: when that ruby grammar is LR but NOT LALR, then, yes,
expect lots of conflicts, both shift/reduce and reduce/reduce. If I recall
my theory correctly, any LR grammar can be turned in a LALR one, but you
will have to do that by hand and the result won't necessarily be 'nice' and
'human readable' if you get my drift.
The worst part is that, after all the burden, I'll have to support all this mess for years.
…
2) porting the ruby grammar:
…
This should work for jison so that DOES NOT EXPLAIN the many conflicts; maybe the grammar was 'updated' and now uses LR (non-LALR)
features provided by modern bison, but I'll have to see it and spend some
serious time on that.
It is expected to have a problem parsing such a monstrous rule set while not even meant to be used this way.
(Read: I cannot spare the time for that right now and
I don't expect this to become a paid consulting job
I wold be glad to thank you with escudos for this answer. It saved me much time. How can I perform such action?
NOTE THAT I AM NOT REFERRING TO THE INLINE ACTION BLOCKS PROBLEM HERE (
http://www.gnu.org/software/bison/manual/bison.html#Mid_002dRule-Actions );
THIS IS ABOUT PARSER THEORY/TECHNOLOGY ITSELF.
Totally understood. Generating the tree in far more simple then kickstart the parser.
…
If you're not yet fluent in LR parser theory, you may have a Long Walk
ahead of you.
Yes, I have less knowledge about parsers then the task needs to be solved flawlessly. But, I'm brave as far as I can see projects like JRuby, Iron Ruby, Opal and others who has dealt with the parser.
I expect to spend much time on this. Porting a real world complex parser is an exciting adventure I really want to have :)
2.b) When you follow those links above, you'll see a few things. Also,
there's https://github.com/ruby/ruby/blob/trunk/tool/id2token.rb which is
part of that original ruby build action at
https://github.com/ruby/ruby/blob/trunk/common.mk#L587
I'll convert it by hand, if not already.
while full awareness
of the parser generator process in general and bison in particular should
have led you on an immediate hunt for lex/flex,
The first thing I were looking for, indeed.
…
https://github.com/ruby/ruby/blob/trunk/parse.y#L6852 ta!-da! custom lexer,
using gperf assistance.
Sad, but true. 1300+ lines of hardcode… oh, my. Yes, I have seen it already. Luckily, JavaScript is in the same language family with C, and porting the lexer tend to be a long, boring, but doable thing.
Time spent so far: ~ 20 minutes.
My time so far is around two weeks :)
…
if you were not able to follow along smoothly, combined with the initial problem you posted, this is a strong indication that you have no solid previous
experience in the referred language theory and original tools which were
used to create ruby; this means that the goal of porting ruby to JavaScript
would be a many-hurdle learning experience for you, both in C programming
(as you'll have to be able to 'decode' what the ruby people wrote and did,
particularly where they went off-mainstream, such as their lexer) and in
jison.
It is the first grammar I'v ever seen since high school. I'm good in JavaScript. And I can read and write medium programs in C.
And the main two ingredients: I'm passioned and I have time for it right now.
…
if Zaach has enough time available to do this Real Soon Now
I hope he has. As far as there are only two of us in the thread, one can make a conclusion, that Jison is not in focus right now.
but second is a 'not useful' item IFF you
mean you mean compiling the ruby grammar in bison and having bison spit out
JSON for /jison/
No, I meant the other trick.
…
I'd rather suggest getting bison to produce the generated parser in JavaScript
I meant this one :)
…
'today' it means you'll have to learn/use GNU M4 to produce a suitable bison 'skeleton' for the JavaScript target
Yeap, had a look at Bison's skeletons yesterday. They broke my eyes :)))
A viable 'one off' alternative would be to take the bison generated C
parser and port that code (including tables) over to JavaScript
I'll try the compilation of the two above: skeleton + postproduction. As far as working with strings, arrays and hashes is much simpler in JS, I'll try to write a minimum skeleton to move resulting transitions table out of Bison before it transforms it for needs of C language. If possible :) Then the time for JS converter to handle the actual parser production.
I wish myself luck in such a challenging task.
, port the custom lexer over to JavaScript as well and then take it from there.
Sad, but no Ruby parser can avoid porting the lexer from parse.y. As far as the lexer is not context free, I'll better port it as is.
…
you may find the mentioned 'M4/skeleton for JavaScript target' a viable option and other people would be grateful
for that output, I'm sure! :-)
Agreed. If I can get helped with those skeletons, the work could be done without JavaScript engine ever involved. But, pretending someone has added the JS target with a help of JS engine, I'll use it without a doubt and donate to such a handsome someone :)
Thank you again! :)
It seems that m4 is an funny template engine with essential scripting abilities. Models lay separated from views, which is good :)
Got calculator working in pure JS parser generated with Bison. Yes, m4 is an ugliest preprocessor ever! :)
Here is the result: https://github.com/kung-fu-tzu/bison-lalr1.js Help with testing it is greatly appreciated! :)
OK, so far so good. Have got identical output of state transitions with yacc skeleton. That helps a lot. @GerHobbelt, miss your giant responses ;)
Sorry, got completely swamped here.
And I saw you moved to bison, which is off-limits stuff for me -- meaning I don't have it ready on the devbox nowadays as everything is JS for me right now, with a dusting of PHP. And I built some stuff into the jison lexer that enabled me to do what needed to be done (lexer rule rejection), so I am currently 'locked' into using jison. The alternative for me would be to revert to bison, and either
...which would be nice in some far-away way but in reality would cost me dearly at a time when I cannot afford that particular bit of fun.
I'm glad to see you got a bison skeleton going though; however don't expect any efforts from me right now that surpass spending maybe half an hour banging on a keyboard + basic thought.
Granted, my current grammar has some nasty (bleep) in it, but taking that out or even porting it is very much like ... 'Going Oh Shit' (a la Festina Ramos)
Aside: since we've moved from jison to bison and zero jison, it might not be prudent to continue that conversation here in the jison issue tracker. Not that I mind, but others might.
Anyway, apart from the aside, since you now seem to have a basic skeleton up for bison, you might want to either stress-test the bugger by feeding it a few more grammars and see what happens, or switch to handling the lexer conversion to JS as a bison grammar is cute, but without a lexer you're still at a useless 50%: the other half is munching the input text into tokens and IIRC Ruby (which you intended to port over to JS) used a gperf hash table + custom code -- something which for starters I would simply replace by a JavaScript object a.k.a. 'dictionary' replacing the gperf effort (let the JS engine handle that bit for now) and then focus on their custom lexer bit to see what needs to be done to port that over from C into JS.
Met vriendelijke groeten / Best regards,
Ger Hobbelt
web: http://www.hobbelt.com/ http://www.hebbut.net/ mail: ger@hobbelt.com
On Fri, Jun 14, 2013 at 9:52 PM, Peter Leonov notifications@github.comwrote:
OK, so far so good. Have got identical output of state transitions with yacc skeleton. That helps a lot. @GerHobbelt https://github.com/GerHobbelt, miss your giant responses ;)
— Reply to this email directly or view it on GitHubhttps://github.com/zaach/jison/issues/173#issuecomment-19478062 .
OK, last comment on this issue. JS skeleton for Bison powers now full featured ruby parser in pure JS.
Thank you both (the author of Jison, and you, Ger Hobbelt) for inspiration! May the Force be with you!
UPDATE: Fund what it is in the bison manual. This is an Actions in Mid-Rule.
In the ruby language parser there is a strange construction:
which produce a syntax error:
Help me handle this kind of expressions, please :)