morfologik / morfologik-stemming

Tools for finite state automata construction and dictionary-based morphological dictionaries. Includes Polish stemming dictionary.
BSD 3-Clause "New" or "Revised" License
186 stars 44 forks source link

question on building the FSA #105

Closed mtrevisan closed 4 years ago

mtrevisan commented 4 years ago

In FSABuilder.add:166 there is a comment: "The input must be lexicographically greater than any previously added sequence". Why the input must be so? is there a way to relax this conditions? I have a VERY large file to convert into FSA (1.2+ GB) and sorting it takes forever.

dweiss commented 4 years ago

It's a precondition of the algorithm. 1.2GB of input is not very large, man.

dweiss commented 4 years ago

Incremental construction of minimal acyclic finite state automata https://www.aclweb.org/anthology/J00-1002/