Optimizing input-epsilon transitions

kylebgorman / pynini

Read-only mirror of Pynini

http://pynini.opengrm.org

Apache License 2.0

118 stars 27 forks source link

Optimizing input-epsilon transitions #49

Closed sharvil closed 2 years ago

sharvil commented 2 years ago

Consider the following toy example with an alphabet consisting of symbols 'a' and 'b':

  count = 10000
  fst = closure(cross('a', 'b' * count))
  input_string = 'a' * 100

  shortestpath(input_string @ fst).string()

Applying the FST rule to the input string takes linear time wrt count. This happens because O(count) epsilon:b transitions are introduced. Of course, there's no getting around the fact that O(count) outputs are produced for each input symbol, but the cost of each transition seems to be non-trivial (the code above takes ~1s on a 3.6GHz Core i7). Even outside of this toy example, the run-time cost of those epsilon transitions can be substantial (e.g. tagging during text normalization).

As far as I can tell, the epsilon-removal algorithm doesn't collapse chains of epsilon-input transitions. Is it possible to add that to pynini (assuming it improves performance)?

kylebgorman commented 2 years ago

Yes, the epsilon-removal algorithm only removes arcs such that they both have input and output epsilons. My question to you is what should fst look like to make this more efficient? I.e., what does an equivalent machine with these chains collapsed look like topologically?

There do exist two algorithms to move around epsilons in transducers, one of which might be relevant here:

(input-) epsilon normalization (pynini.epsnormalize): https://www.openfst.org/twiki/bin/view/FST/EpsNormalizeDoc
synchronization (pynini.synchronize): https://www.openfst.org/twiki/bin/view/FST/SynchronizeDoc

sharvil commented 2 years ago

Thanks for the quick response!

The equivalent machine that I'd ideally get out would deviate slightly in its representation from a formal WFST. Specifically, if the output label could be a string of symbols instead of just one symbol, the equivalent machine would be a single final state with a self-arc having labels a:bbbb....

Mohri also describes this form of epsilon-removal in https://cs.nyu.edu/~mohri/pub/ijfcs.pdf (section 5, on epsilon-input normalization). In particular, figures 4a and 5a show the before/after of input-epsilon removal on a transducer assuming the output label is a string.

kylebgorman commented 2 years ago

I see. Sadly, we don't have any way to map onto that at present, though that kind of representation is sometimes used internally for various algorithms.