Open Ravenwater opened 7 years ago
I wouldn't mind against good jison tutorials too. Or maybe even more human readable way to write rules.
http://jison.org/ is wonderful, but I couldn't find any documentation how to replace the C-style bindings to main() that make lex and yacc so productive for building text processing tools.
I do see that the third section of the syntax and grammar files get reproduced just prior to the export. I had hoped that others had used jison-lex/jison as a tool building automation, and thus could bootstrap me, but alas, I'll dive into the code and see if I can make it work.
See StackOverflow answer
wordcount.jison
%lex
%options flex
%{
if (!('chars' in yy)) {
yy.chars = 0;
yy.words = 0;
yy.lines = 1;
}
%}
%%
[^ \t\n\r\f\v]+ { yy.words++; yy.chars += yytext.length; }
. { yy.chars++; }
\r { yy.chars++; }
\n { yy.chars++; yy.lines++; }
/lex
%%
E : { console.log( yy.lines + "\t" + yy.words + "\t" + yy.chars); };
Earlier Answer _
Since I am just starting out with Jison and using flex & bison as a reference which has the word count example and ran into the same problem I am posting this to help others. This is not the best way to do it, but it does get one past this example and on to making more progress with Jison.
wordcount.jison
// wordcount.jison
// Based on the example in "flex & bison" by John Levine
// This is a wordcount example.
// Lexer Grammar
%lex
/* Lexer Section 1 : Definitions */
%{
console.log("In Lexer Definitions section");
%}
%%
/* Lexer Section 2 : Rules */
[a-zA-Z]+
{
console.log("In Lexer Rule WORD");
console.log("Matched: '" + this.match + "'");
return 'WORD';
}
\n
{
console.log("In Lexer Rule LF");
console.log("Matched: line feed");
return 'LF';
}
\r
{
console.log("In Lexer Rule CR");
console.log("Matched: carriage return");
return 'CR';
}
<<EOF>>
{
console.log("In Lexer Rule EOF");
console.log("Matched: <<EOF>>");
return 'EOF';
}
.
{
console.log("In Lexer Rule SEP");
console.log("Matched: '" + this.match + "'");
return 'SEP';
}
%%
/* Lexer Section 3 : User Code */
console.log("In Lexer User Code section");
/lex
// Parser Grammar
/* Parser Section 1 : Definitions */
%{
/* code block */
console.log("In Parser Definitions section");
let myChars = 0;
let myWords = 0;
let myLines = 0;
%}
%%
/* Parser Section 2 : Rules */
input
: sentences eof
;
sentences :
sentence cr lf sentences
| sentence
;
sentence :
word sep sentence
| word sep
| word
;
word
: WORD
%{
console.log("In Parser Rule WORD");
myWords++; myChars += yytext.length;
%}
;
cr : CR
%{
console.log("In Parser Rule CR");
myChars++;
%}
;
lf : LF
%{
console.log("In Parser Rule LF");
myChars++; myLines++;
%}
;
sep : SEP
%{
console.log("In Parser Rule SEP");
myChars++;
%}
;
eof : EOF
%{
console.log("In Parser Rule EOF");
myChars++; myLines++;
console.log("Lines: " + myLines + ", Words: "+ myWords + ", Chars: " + myChars);
%}
;
%%
/* Parser Section 3 : Epilogue */
console.log("In Parser Epilogue section");
wordcount_input.txt
This is line one.
line two.
My development environment consist of:
>jison wordcount.jison
>node wordcount.js wordcount_input.txt
output
In Parser Definitions section
In Parser Epilogue section
In Lexer User Code section
In Lexer Definitions section
In Lexer Rule WORD
Matched: 'This'
In Lexer Definitions section
In Lexer Rule SEP
Matched: ' '
In Parser Rule WORD
In Lexer Definitions section
In Lexer Rule WORD
Matched: 'is'
In Parser Rule SEP
In Lexer Definitions section
In Lexer Rule SEP
Matched: ' '
In Parser Rule WORD
In Lexer Definitions section
In Lexer Rule WORD
Matched: 'line'
In Parser Rule SEP
In Lexer Definitions section
In Lexer Rule SEP
Matched: ' '
In Parser Rule WORD
In Lexer Definitions section
In Lexer Rule WORD
Matched: 'one'
In Parser Rule SEP
In Lexer Definitions section
In Lexer Rule SEP
Matched: '.'
In Parser Rule WORD
In Lexer Definitions section
In Lexer Rule CR
Matched: carriage return
In Parser Rule SEP
In Parser Rule CR
In Lexer Definitions section
In Lexer Rule LF
Matched: line feed
In Parser Rule LF
In Lexer Definitions section
In Lexer Rule WORD
Matched: 'line'
In Lexer Definitions section
In Lexer Rule SEP
Matched: ' '
In Parser Rule WORD
In Lexer Definitions section
In Lexer Rule WORD
Matched: 'two'
In Parser Rule SEP
In Lexer Definitions section
In Lexer Rule SEP
Matched: '.'
In Parser Rule WORD
In Lexer Definitions section
In Lexer Rule EOF
Matched: <<EOF>>
In Parser Rule SEP
In Parser Rule EOF
Lines: 2, Words: 6, Chars: 29
Notes:
return
in each lexer rule, e.g. return 'WORD';
CR/LF
instead of just LF
, had to adjust accordingly.this.match
in lexer because yytext
is not available/was not working in lexer. Still learning.main
function due to the way Jison runs generated code. Easiest way around was to put action into EOF parser rule.C
to JavaScript
, e.g. strlen(yytext)
to yytext.length
To help me understand the sections of Jison, I liberally added lots of comments and code sections to see how the user code was getting inserted into the Jison boilerplate code. This helped out because I soon realized that leaving out return
statements with the lexer actions was causing problems, and the counters were getting initialized with each lexer rule instead of just once. Also I had to use the parser because I am still learning and have not figured out how to get just flex
to work in Jison.
Hope this helps you and others.
In the flex/bison world, you can write simple text processing utilities. For example, a wc program:
%{ /*
var nrchars, nrwords, nrlines;
%}
%%
\n ++nrchars, ++nrlines; [^ \t\n] ++nrwords, nrchars += yyleng; . ++nrchars;
%%
main() { yylex(); printf("%d\t%d\t%d\n", nrchars, nrwords, nrlines); }
---EOF
I have yet to discover how to write these types of lex/yacc tools with jison-lex/jison. Can somebody enlighten me, please?