Open neizod opened 12 years ago
Have you tried using:
"^a"
just try it and nothing happen as i expected.
This is tricky because the lexer uses JavaScript regular expressions, which don't allow you to start from an arbitrary position in a string. This means a new string is created each time starting at end of the last match, so ^
is technically alway true.
A possible workaround would be to prepend the input with a unique character and replace ^
with that character in the rules.
@zaach The y flag [0] may help with this, however I don't know about how supported it is in other browsers than Gecko-based.
[0] https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/RegExp
Quick and dirty hack to solve this:
"a" %{
this.yy_ = this;
return (this.yylloc.first_column === 0) ? 'HEAD' : 'BODY';
%}
What about using custom scanners? I have written a library called Lexer in the spirit of Flex which allows you to match arbitrary expressions as follows:
var Parser = require("jison").Parser;
var Lexer = require("lex");
var grammar = {
"bnf": {
// ...
}
};
var parser = new Parser(grammar);
var lexer = parser.lexer = new Lexer;
lexer.addRule(/^a/, function (lexeme) {
this.yytext = lexeme;
return "BODY";
});
lexer.addRule(/a/, function (lexeme) {
this.yytext = lexeme;
return "HEAD";
});
Perhaps we could integrate it into Jison to be the default scanner? Advantages:
I've also wanted to improve the performance of Lexer for quite a while by using Finite State Automata instead of native regular expressions. Perhaps we could work on that collaboratively?
@aaditmshah A more JavaScript friendly lexer is definitely a nice thing to have, but one of the qualifications for the default lexer is that it can be expressed in a way that's familiar to Flex users.
I've thought about implementing a regex engine in JS, but building one with enough features and speed to be useful is more than I have time for. Another option I believe others have explored is compiling a C/C++ regex engine using emscripten.
I have enough time to implement a regex engine in pure JavaScript. What is the interface required to integrate a regex engine with jison? Is it the same interface that's exposed by jison-lex?
Since now we have "sticky" flag, we can make all regex sticky and multiline (/my) and manually set lastIndex
of the regex going to test to the last matched regex's lastIndex
?
var match, rule, lastIndex, i;
lastIndex = lastMatchRegex.lastIndex;
for (i = 0; i < rules.length; i++) {
rule = rules[i];
rule.regex.lastIndex = lastIndex;
match = input.match(rule.regex);
if (match) {
return match[0];
}
}
@amobiz hey, are you DDOSing?
Sorry, thought no one is here. Just try to update information.
In the lexer part:
test case:
a a
return token:BODY
BODY
whilereturn token:
HEAD
HEAD
. (expected:HEAD
BODY
)