Since a lot has been done and several of these features are tough to 'extract cleanly' to produce 'simple' patches (they won't be simple anyway), the list of differences (features and fixes in the derived repo):
[to be completed]
Main features
full Unicode support (okay, astral codepoints are hairy and only partly supported) in lexer and parser
lexer can handle XRegExp \pXXX unicode regex atoms, e.g. \p{Alphabetic}
jison auto-expands and re-combines these when used inside regex set expressions in macros, e.g.
ALPHA [{UNICODE_LETTER}a-zA-Z_]
will be reduced to the equivalent of
ALPHA [{UNICODE_LETTER}_]
hence you don't need to worry your regexes will include duplicate characters in regex [...] set expressions.
parser rule names can be Unicode identifiers (you're not limited to US ASCII there).
lexer macros can be used inside regex set expressions (in other macros and/or lexer rules); the lexer will barf a hairball (i.e. throw an informative error) when the macro cannot be expanded to represent a character set without causing counter-intuitive results), e.g. this is a legal series of lexer macros now:
ASCII_LETTER [a-zA-z]
UNICODE_LETTER [\p{Alphabetic}{ASCII_LETTER}]
ALPHA [{UNICODE_LETTER}_]
DIGIT [\p{Number}]
WHITESPACE [\s\r\n\p{Separator}]
ALNUM [{ALPHA}{DIGIT}]
NAME [{ALPHA}](?:[{ALNUM}-]*{ALNUM})?
ID [{ALPHA}]{ALNUM}*
the parser generator produces optimized parse kernels: any feature you do not use in your grammar (e.g. error rule driven error recovery or @elem location info tracking) is rigorously stripped from the generated parser kernel, producing the fastest possible parser engine.
you can define a custom written lexer in the grammar definition file's %lex ... /lex section in case you find the standard lexer is too slow to your liking on otherwise insufficient. (This is done by specifying a no-rules lexer with the custom lexer placed in the lexer trailing action code block.)
you can %include action code chunks from external files, in case you find that the action code blurbs obscure the grammar's / lexer's definition. Use this when you have complicated/extensive action code for rules or a large amount of 'trailing code' ~ code following the %% end-of-ruleset marker.
CLI: -c 2 -- you now have the choice between two different table compression algorithms:
mode 2 creates the smallest tables,
mode 1 is the one available in 'vanilla jison' and
mode 0 is 'no compression what-so-ever'
Minor 'Selling Points'
you can produce parsers which do not include a try ... catch wrapper for that last bit of speed and/or when you want to handle errors in surrounding userland code.
all errors are thrown using a parser and lexer-specific Error-derived class which allows userland code to discern which type of error (and thus: available extra error information!) is being processed via a simple/fast instanceof check for either of them.
the jison CLI tool will print additional error information when a grammar parse error occurred (derived off / closely related to #321 and #258)
the jison CLI tool will print parse table statistics when requested (-I commandline switch) so you can quickly see how much table space your grammar is consuming. Handy when you are optimizing your grammar to reduce the number of states per parse for performance reasons.
includes [a derivative or close relative of] #326, #316, #302, #290, #284
fixes
358 (crashes on this.yy.parser missing errors)
356 (wrong input attached to error)
353 (crashes on this.yy.lexer missing errors)
352 (token_stack label issue: jison-gho's way of code stripping does depend on labels at all, so the issue is moot now)
349 (YYRECOVERING macro support -- should work, fingers crossed :wink:)
348 (performAction invocation trouble)
333 (lexer recognizes literal regex parts without quotes whenever possible),
328 (all errors are Error-derived instances with a text message and extra info attached),
317 (?not sure?),
313,
301,
299 (with minor additional abilities compared to vanilla jison, e.g. configurable error recovery search depth),
296 (unused grammar rules are reported and nuked, i.e. not included in the generated output),
282,
276 (and we support JSON5 format besides!),
254,
239 (all parser stacks are available in all grammar rule action code via yyvstack, yysstack, etc. -- documented in the documented grammar file's top API documenting comment chunk),
233 (EBNF rewriting to BNF now works; see also the wiki),
231,
218 (and parseError can now produce a return value for the parser to return to the calling userland code),
210,
175 (kind of..., we now support %include filepath statements in stead of any code chunk),
165 (kind of... now jison does not fetch look-ahead when the rule reduce action doesn't need it; it requires intimate understanding of your grammar and the way this LALR grammar engine handles it, but you can once again code 'lexer hacks' from inside parser rules' action code. Shudder or rejoice, depending on your mental make-up ;-) ),
138 (instanceof of parser and lexer error class),
121 (indirectly, you can now do this by writing an action code chunk for an initial 'epsilon' rule and get this behaviour that way)
Where is this thing heading?
using recast et al to help analyze rule action code to help code-strip both parser and lexer to produce fast parse/lex runs. Currently only the parser gets analyzed (a tad roughly) to strip costly operations from the parser run-time to make it fast / efficient.
Since a lot has been done and several of these features are tough to 'extract cleanly' to produce 'simple' patches (they won't be simple anyway), the list of differences (features and fixes in the derived repo):
[to be completed]
Main features
full Unicode support (okay, astral codepoints are hairy and only partly supported) in lexer and parser
lexer can handle XRegExp
\pXXX
unicode regex atoms, e.g.\p{Alphabetic}
will be reduced to the equivalent of
hence you don't need to worry your regexes will include duplicate characters in regex
[...]
set expressions.parser rule names can be Unicode identifiers (you're not limited to US ASCII there).
lexer macros can be used inside regex set expressions (in other macros and/or lexer rules); the lexer will barf a hairball (i.e. throw an informative error) when the macro cannot be expanded to represent a character set without causing counter-intuitive results), e.g. this is a legal series of lexer macros now:
the parser generator produces optimized parse kernels: any feature you do not use in your grammar (e.g.
error
rule driven error recovery or@elem
location info tracking) is rigorously stripped from the generated parser kernel, producing the fastest possible parser engine.you can define a custom written lexer in the grammar definition file's
%lex ... /lex
section in case you find the standard lexer is too slow to your liking on otherwise insufficient. (This is done by specifying a no-rules lexer with the custom lexer placed in the lexer trailing action code block.)you can
%include
action code chunks from external files, in case you find that the action code blurbs obscure the grammar's / lexer's definition. Use this when you have complicated/extensive action code for rules or a large amount of 'trailing code' ~ code following the%%
end-of-ruleset marker.CLI:
-c 2
-- you now have the choice between two different table compression algorithms:Minor 'Selling Points'
you can produce parsers which do not include a
try ... catch
wrapper for that last bit of speed and/or when you want to handle errors in surrounding userland code.all errors are thrown using a parser and lexer-specific
Error
-derived class which allows userland code to discern which type of error (and thus: available extra error information!) is being processed via a simple/fastinstanceof
check for either of them.the jison CLI tool will print additional error information when a grammar parse error occurred (derived off / closely related to #321 and #258)
the jison CLI tool will print parse table statistics when requested (
-I
commandline switch) so you can quickly see how much table space your grammar is consuming. Handy when you are optimizing your grammar to reduce the number of states per parse for performance reasons.includes [a derivative or close relative of] #326, #316, #302, #290, #284
fixes
358 (crashes on
this.yy.parser
missing errors)356 (wrong input attached to error)
353 (crashes on
this.yy.lexer
missing errors)352 (token_stack label issue: jison-gho's way of code stripping does depend on labels at all, so the issue is moot now)
349 (YYRECOVERING macro support -- should work, fingers crossed :wink:)
348 (
performAction
invocation trouble)333 (lexer recognizes literal regex parts without quotes whenever possible),
328 (all errors are
Error
-derived instances with a text message and extra info attached),317 (?not sure?),
313,
301,
299 (with minor additional abilities compared to vanilla jison, e.g. configurable error recovery search depth),
296 (unused grammar rules are reported and nuked, i.e. not included in the generated output),
282,
276 (and we support JSON5 format besides!),
254,
239 (all parser stacks are available in all grammar rule action code via
yyvstack
,yysstack
, etc. -- documented in the documented grammar file's top API documenting comment chunk),233 (EBNF rewriting to BNF now works; see also the wiki),
231,
218 (and
parseError
can now produce a return value for the parser to return to the calling userland code),210,
175 (kind of..., we now support
%include filepath
statements in stead of any code chunk),165 (kind of... now jison does not fetch look-ahead when the rule reduce action doesn't need it; it requires intimate understanding of your grammar and the way this LALR grammar engine handles it, but you can once again code 'lexer hacks' from inside parser rules' action code. Shudder or rejoice, depending on your mental make-up ;-) ),
138 (
instanceof
of parser and lexer error class),121 (indirectly, you can now do this by writing an action code chunk for an initial 'epsilon' rule and get this behaviour that way)
Where is this thing heading?
recast
et al to help analyze rule action code to help code-strip both parser and lexer to produce fast parse/lex runs. Currently only the parser gets analyzed (a tad roughly) to strip costly operations from the parser run-time to make it fast / efficient.