haskell / alex

A lexical analyser generator for Haskell
https://hackage.haskell.org/package/alex
BSD 3-Clause "New" or "Revised" License
297 stars 82 forks source link

Alex slow on big regular expressions #163

Open andreasabel opened 3 years ago

andreasabel commented 3 years ago

From this LBNF file, compiled with BNFC-2.8.4,

EInt.  Exp ::= Integer;
EPlus. Exp ::= Exp "+" Integer;

comment "anananas" "anananas";

I get a huge regular expression to recognize block comments (started and ended by anananas): Lex.x.txt

Alex does produce a valid lexer from this, but this takes a rather long time.

I wonder whether Alex could be optimized by introducing sharing into the regular expressions and thus reuse automata generated from subexpressions (rather than computing them again).