kylebgorman / pynini

Read-only mirror of Pynini
http://pynini.opengrm.org
Apache License 2.0
120 stars 26 forks source link

How to set token type for results of union operation? #8

Closed wrznr closed 5 years ago

wrznr commented 5 years ago

Both, pynini.acceptor and pynini.transducer, allow for the explicit definition of a symbol type via the token_type keyword argument. How can I achieve a similar configuration for pynini.union (the default seems to be byte)?

Using token_type is not possible:

TypeError: union() got an unexpected keyword argument 'token_type'

Assigning a previously constructed symbol table afterwards,

card_stem = pynini.union("drei", "vier", "fünf")
card_stem.set_input_symbols(symbol_table)
card_stem.set_output_symbols(symbol_table)

results in incorrect symbol mappings. Any suggestions are very much appreciated!

kylebgorman commented 5 years ago

On Oct 31, 2018, at 12:59 PM, Kay-Michael Würzner notifications@github.com wrote:

Both, pynini.acceptor and pynini.transducer, allow for the explicit definition of a symbol type via the token_type keyword argument. How can I achieve a similar configuration for pynini.union (the default seems to be byte)?

Using token_type is not possible:

TypeError: union() got an unexpected keyword argument 'token_type' Assigning a previously constructed symbol table afterwards,

card_stem = pynini.union("drei", "vier", "fünf" ) card_stem.set_input_symbols(symbol_table) card_stem.set_output_symbols(symbol_table)

results in incorrect symbol mappings.

Hi there,

Every operation that accepts an FST argument optionally accepts a string instead; it is only those operations which “only” accept strings which support “token_type” (acceptor) or “input_token_type”/“output_token_type” (transducer, string_map, and string_file).

You could always just compile your FSTs first using acceptor:

Not tested but should be approximately right.

def union2(*args, token_type):
     return union(*(acceptor(arg, token_type=token_type) for arg in args))

And similarly for other arguments.

In the meantime, I am working on a patch for Pynini 2.0.2 which will allow you to set global defaults for string coercion to FSTs using a singleton.

wrznr commented 5 years ago

Works great! Many thanks.

kylebgorman commented 3 years ago

This is late but there is now a pynini.default_token_type context manager you can use to put whole blocks of code into some non-default token type mode and then take advantage of string coercion to FST.