Closed true-grue closed 2 years ago
This happens because your sp
token allows zero length, So this essentially means that you can have ten int
tokens not separated by anything, but with tags in between, which makes tags very ambiguous. That's why the generated file has many states and tons of tag operations. Change sp
to [ \t]+
(except perhaps for the first occurrence after "L"
) and the output is much smaller because all ambiguity is removed.
As for direct-encoded vs. table-based, it is not relevant here: what matters is the number of states. In the ambiguous case the generated DFA has 388 states, and in the non-ambiguous case only 21. A large number of states would require large table sizes as well.
@skvadrik It looks like due to some technical reason you've seen only my initial message, before I've edited it.
Anyway, thank you!
The code with lots of stags, generated by re2c, is surprisingly too large.
Here is a simple test, not a working cpp program:
The result file generated by re2c for this tiny code is no less than 200 KB!
Is there a way to fix it or it's a consequence of using direct-code method in re2c instead of table-based one?
UPDATE. In fact, it's a pathological example and to fix it one need to rewrite sp as the following: sp = [ \t]+;
Still, it would be great to make the generated code more compact. Maybe it makes sense to try to automatically refactor some identical parts into standalone functions. Looks like a good research theme!