Closed lsf37 closed 9 years ago
Commented by willink on 2002-09-04 15:46 UTC Slow to compile grammar
Commented by lsf37 on 2002-09-25 12:23 UTC Logged In: YES user_id=93534
Sorry for the late answer, the bug tracker's "monitor" function seems to have let me down.
The problem with the character classes in the attached
grammar is that they are formulated with the "|" operator, not
as a pure char class expression (which would use only one
[..]). JFlex translates this as the general "or" (which increases
the initial DFA size) and not as character class (which
wouldn't). The final DFA size (after minimization) should be
the same, though (which doesn't help you, of course, if JFlex
never reaches that state).
I realize that it is impractical to have char classes of this size
in one single expression in the specification. I will see if I can
optimize the first stage (RegExp->DFA) so that JFlex
recognizes when "|" is used for char class enumeration only.
Updated by lsf37 on 2002-09-25 12:25 UTC
Updated by lsf37 on 2004-04-12 12:31 UTC
Reported by willink on 2002-09-04 15:46 UTC
I just moved up from JLex.
Much impressed by the better speed, error detection.
Had a problwm with exponential time and memopry on 1.3.5, which seems to be much alleviated in the 1.4_pre1 (it now runs).
I suspest there is still something that could be done to speed large Unicode grammars.
It seems wrong that just expanding the number of lenumerated elements in an unchanging number of input character classes should change the number of DFA states, and consequently the DFA to NFA conversion time.
To demonstrate, use a typical XML grammar (attached). It requires 3187 NFA states, whereas after commenting out all 16 bit character lasses it needs only 1000 odd. The latter compiles quite rapidly, the former si slow but tractiavble with the pre-release.