mhulden / foma

Automatically exported from code.google.com/p/foma
117 stars 90 forks source link

`read lexc` does not respect the `minimal` variable #70

Open DavidNemeskey opened 7 years ago

DavidNemeskey commented 7 years ago

Take the following grammar:

LEXICON Root
pack # ;
talk # ;
walk # ;

The load foma, and

set minimal OFF
read lexc my_grammar.lexc

will result in a network with 7 states and 8 arcs -- same as if the first line wasn't there. Using regular expressions to for the same grammar

set minimal OFF
read regex {pack} | {talk} | {walk} ;

correctly creates an FSA with 13 states and 12 arcs.

As a side note, if I comment out the some lines from lexc_to_fsm in lexcread.c (lexc_merge_states(), and everything at the end), I end up with a network of 11 states and 13 arcs: this is the same as the previous one with the accepting states collapsed into one.

The main problem is that read lexc does not respect the minimal variable. I don't think it is optimal that the two unminimized networks are different either, but I can live with that.