kylebgorman / pynini

Read-only mirror of Pynini
http://pynini.opengrm.org
Apache License 2.0
120 stars 26 forks source link

Question about optimization #15

Closed Ulitochka closed 5 years ago

Ulitochka commented 5 years ago

Hello.

I have a code from tutorial:

chars = ([chr(i) for i in range(1, 91)] + ["\[", "\]", "\\"] + [chr(i) for i in range(94, 256)]) sigma_star = pynini.union(*chars).closure()

singular_map = pynini.union( pynini.transducer("feet", "foot"), pynini.transducer("pence", "penny"),

# Any sequence of bytes ending in "ches" strips the "es";
# the last argument -1 is a "weight" that gives this analysis a higher priority, if it matches the input.
sigma_star + pynini.transducer("ches", "ch", -1),

# Any sequence of bytes ending in "s" strips the "s".
sigma_star + pynini.transducer("s", "")

)

rc = pynini.union(".", ",", "!", ";", "?", " ", "[EOS]") singularize = pynini.cdrewrite(singular_map, " 1 ", rc, sigma_star) singularize.optimize()

I use method optimize(). But without this method statisics about this FST doesn't changes:

fstinfo test_fst.fst

fst type vector arc type standard input symbol table Byte symbols output symbol table Byte symbols № of states 56 № of arcs 5480 initial state 0 № of final states 37 № of input/output epsilons 0 № of input epsilons 0 № of output epsilons 28 input label multiplicity 1.01204 output label multiplicity 1.00292 № of accessible states 56 № of coaccessible states 56 № of connected states 56 № of connected components 1 № of strongly conn components 2 input matcher y output matcher n input lookahead n output lookahead n expanded y mutable y error n acceptor n input deterministic n output deterministic n input/output epsilons n input epsilons n output epsilons y input label sorted y output label sorted n weighted y cyclic y cyclic at initial state y top sorted n accessible y coaccessible y string n weighted cycles y

If I use several methods to optimize FST, like: singularize = pynini.epsnormalize(singularize, eps_norm_output=True) singularize = pynini.disambiguate(singularize) determ_singularize = pynini.determinize(singularize, det_type="nonfunctional") determ_singularize.minimize(allow_nondet=False)

The optimization time is greatly increased, which does not happen when using the method optimize().

Could you help me find out the reason for this behavior?

kylebgorman commented 5 years ago

Your optimization method is, simply put, computationally more complex than the one performed by optimize, and will produce different results.

See the documentation here: http://www.opengrm.org/twiki/bin/view/GRM/PyniniOptimizeDoc

On Tue, Jul 16, 2019 at 2:30 AM Ulitochka notifications@github.com wrote:

Hello.

I have a code from tutorial:

chars = ([chr(i) for i in range(1, 91)] + ["[", "]", "\"] + [chr(i) for i in range(94, 256)]) sigma_star = pynini.union(*chars).closure()

singular_map = pynini.union( pynini.transducer("feet", "foot"), pynini.transducer("pence", "penny"),

Any sequence of bytes ending in "ches" strips the "es";

the last argument -1 is a "weight" that gives this analysis a higher priority, if it matches the input.

sigma_star + pynini.transducer("ches", "ch", -1),

Any sequence of bytes ending in "s" strips the "s".

sigma_star + pynini.transducer("s", "")

)

rc = pynini.union(".", ",", "!", ";", "?", " ", "[EOS]") singularize = pynini.cdrewrite(singular_map, " 1 ", rc, sigma_star) singularize.optimize()

I use method optimize(). But without this method statisics about this FST doesn't changes: fstinfo test_fst.fst

fst type vector arc type standard input symbol table Byte symbols output symbol table Byte symbols № of states 56 № of arcs 5480 initial state 0 № of final states 37 № of input/output epsilons 0 № of input epsilons 0 № of output epsilons 28 input label multiplicity 1.01204 output label multiplicity 1.00292 № of accessible states 56 № of coaccessible states 56 № of connected states 56 № of connected components 1 № of strongly conn components 2 input matcher y output matcher n input lookahead n output lookahead n expanded y mutable y error n acceptor n input deterministic n output deterministic n input/output epsilons n input epsilons n output epsilons y input label sorted y output label sorted n weighted y cyclic y cyclic at initial state y top sorted n accessible y coaccessible y string n weighted cycles y

If I use several methods to optimize FST, like: singularize = pynini.epsnormalize(singularize, eps_norm_output=True) singularize = pynini.disambiguate(singularize) determ_singularize = pynini.determinize(singularize, det_type="nonfunctional") determ_singularize.minimize(allow_nondet=False)

The optimization time is greatly increased, which does not happen when using the method optimize().

Could you help me find out the reason for this behavior?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kylebgorman/pynini/issues/15?email_source=notifications&email_token=AABG4OOMLA5JLGCT4O7UHWTP7VTJDA5CNFSM4ID5QG4KYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G7MKCMA, or mute the thread https://github.com/notifications/unsubscribe-auth/AABG4OOYB5OAIMU4PGV3PHTP7VTJDANCNFSM4ID5QG4A .

Ulitochka commented 5 years ago

Thanks