Open Mirmu opened 4 days ago
So it's interesting to think what a determinization algorithm that "knows" about epsilons would be like, but I don't think we have a vision of that; I suspect that it's insoluble in the case where there are eps/output arcs, just like it is in many other cases involving transducers.
What you can do is to use other means to move around the epsilons, then determinize or disambiguate afterwards. A few pointers:
A third possibility is to use label encoding to "hide" epsilons and then determinize, and then decode. This is heuristic but it works pretty well. For an instance of this, see the implementation of optimize
here or in chapter 4 (?) of the Pynini book.
Hi!
This is not really an issue but rather a question / feature request.
I have a wFST which associates each input string to multiple (weighted) output strings and from that, I'd like to build an FST that maps each unique accepted input string to its lowest-cost output string.
I feel that something like
pn.disambiguate
orpn.determinize(*, det_type="disambiguate")
would fit the bill. But the original FST contains arcs such as "eps: output_symbol" and those two functions consider epsilon as a standard symbols. Would you know if something is available in Pynini / OpenFST that ignores input epsilon arcs (or is it achievable by other means)?Any help / pointers would be super helpful, thanks a lot 🙏
PS: thanks for the Pynini library, it's a life saver.