kylebgorman / pynini

Read-only mirror of Pynini
http://pynini.opengrm.org
Apache License 2.0
118 stars 27 forks source link

What is @ expresion? #42

Closed anderleich closed 3 years ago

anderleich commented 3 years ago

Hi, The title is self explanatory. What is the @ expresion used in the examples? Does it have an equivalent function?

For example: _phi = (_pad_zeros @ _raw_factorizer @ _del_zeros @ _fix_teens).optimize()

kylebgorman commented 3 years ago

Yes, @ is the composition operator, chosen to resemble \circ. If you prefer to, you can use the pynini.compose, which takes two FST arguments. It also has additional optional arguments: you can disable connecting (i.e., trimming) the automaton after composition with connect=False (which can be useful to debug a failed composition) or select a non-default composition filter, which in certain circumstances can accelerate composition (though this is a very advanced feature and may be dangerous).

K

On Mon, Apr 26, 2021 at 11:38 AM anderleich @.***> wrote:

Hi, The title is self explanatory. What is the @ expresion used in the examples? Does it have an equivalent function?

For example: _phi = (_pad_zeros @ _raw_factorizer @ _del_zeros @ _fix_teens).optimize()

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kylebgorman/pynini/issues/42, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABG4OIOBYHHZ5CBNVJF46LTKWCIRANCNFSM43TEK7JQ .

anderleich commented 3 years ago

Perfect! Thanks!

kylebgorman commented 3 years ago

You need to embed both of them in a cdrewrite, then compose the two rewrite transducers together.

anderleich commented 3 years ago

I have a transducer to recognize numbers. I've embedded it in a cdrewrite. It seems to work just fine, but I've noticed a strange beharviour:

23 -> twenty three 2003 -> two thousand and three

but with two numbers

23 2003 --> twenty three 23 asjkdhasj 2003 --> twenty three asjkdhasj

I'm using rewrite.one_top_rewrite to obtain the best scoring string. I've tried rewrite.top_rewrites with the 10 best strings and none of them is just twenty three

with more numbers 2003 24 21 24,4 21 --> two thousand and three twenty four twenty one twenty four ,four twenty one 2003 24 21 24,4 21 2003 --> empty

More examples 21 21 21 21 21 -> twenty one twenty one twenty one twenty one twenty one 21 21 21 21 21 21 --> twenty one

kylebgorman commented 3 years ago

You've written the rule in a way that it can map 2003 to zero, or generally doesn't have to verbalize the full number. I don't have the context to determine how that's the case but I assume standard debugging procedure will answer that question for you.

(This sort of application-specific debugging should probably be taken off this forum, which is really just for reporting bugs or requesting new features.)