kylebgorman / pynini

Read-only mirror of Pynini
http://pynini.opengrm.org
Apache License 2.0
120 stars 26 forks source link

[input|output]_token_type -> token_type #22

Closed lxkain closed 4 years ago

lxkain commented 4 years ago

the following code used to work, in version 2.0.7 under OSX, to construct a pronunciation FST:

ni.t(p, w, 0, input_token_type=phoneme_table, output_token_type=word_table)

However, in version 2.1.0 it appears that keywords [input|output]_token_type were replaced by a single token_type. How can input and output be set differently? (Maybe construct acceptors first?)

kylebgorman commented 4 years ago

Yeah, that's what I'd recommend. You don't really incur any cost for that extra code since it was doing that behind the scenes anyways.

On Wed, Feb 26, 2020 at 3:14 PM Alexander Kain notifications@github.com wrote:

the following code used to work, in version 2.0.7 under OSX, to construct a pronunciation FST:

ni.t(p, w, 0, input_token_type=phoneme_table, output_token_type=word_table)

However, in version 2.1.0 it appears that keywords [input|output]_token_type were replaced by a single token_type. How can input and output be set differently? (Maybe construct acceptors first?)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kylebgorman/pynini/issues/22?email_source=notifications&email_token=AABG4OP6BYWY6LMSBYI3NK3RE3ERHA5CNFSM4K4NGHP2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IQSMXNA, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABG4OPX5XIDQSN3TUAI5F3RE3ERHANCNFSM4K4NGHPQ .

lxkain commented 4 years ago

OK, that seems to work, but strings are now not showing up properly in jupyter notebooks as they are being drawn.

lxkain commented 4 years ago

Unfortunately, drawing arc label strings of transducers doesn't seem to work, even with

fst.draw('tmp.dot', portrait=True, isymbols=fst.input_symbols(), osymbols=fst.output_symbols())

kylebgorman commented 4 years ago

What are you getting instead?

On Thu, Feb 27, 2020 at 2:36 PM Alexander Kain notifications@github.com wrote:

Unfortunately, drawing transducers doesn't seem to work, even with

fst.draw('tmp.dot', portrait=True, isymbols=fst.input_symbols(), osymbols=fst.output_symbols())

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kylebgorman/pynini/issues/22?email_source=notifications&email_token=AABG4OMIVFNTNDLQJRUHATLRFAI4TA5CNFSM4K4NGHP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENFVL5I#issuecomment-592139765, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABG4OM5ZLKN6EUQ3P3K5Z3RFAI4TANCNFSM4K4NGHPQ .

lxkain commented 4 years ago

The integer indeces

kylebgorman commented 4 years ago

Oh, that's expected with the new revision. We are not longer automatically attaching symbol tables as of 2.1.0 (see NEWS).

On Thu, Feb 27, 2020 at 4:32 PM Alexander Kain notifications@github.com wrote:

The integer indeces

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kylebgorman/pynini/issues/22?email_source=notifications&email_token=AABG4OKDNAI3HPMLDDPITDTRFAWNXA5CNFSM4K4NGHP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENGA4KY#issuecomment-592186923, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABG4OISORL5SVAOVMGGLXDRFAWNXANCNFSM4K4NGHPQ .

lxkain commented 4 years ago

I see. I'm assuming that's because the table was copied and not linked to. It does mean that the shortest way to have jupyter notebook/lab output is now

a = ni.acceptor("you say potato", token_type=word_table)  # space delimits tokens
a.set_input_symbols(word_table)
a

which is unfortunately a little wordy. I'll bypass this by creating my own FST that does attach symbol tables. Thank you for the clarification!

lxkain commented 4 years ago

For other folk who may wonder, for transducers it would now be:

t = ni.transducer(ni.acceptor('you', token_type=word_table), 
              ni.acceptor('j u', token_type=phoneme_table))
t.set_input_symbols(word_table)
t.set_output_symbols(phoneme_table)
lxkain commented 4 years ago

If tables were copied previously, then I can see that operations like:

        tmp = (ni.t(ni.a(p, token_type=self.phoneme_table),
                    ni.a(w, token_type=self.word_table),
                    0
                    ) for w, p in self.pronunciation.items())  # generator
        fst = ni.union(*tmp)
        fst.set_input_symbols(self.phoneme_table)
        fst.set_output_symbols(self.word_table)

are definitely benefiting from this decision, even if simple situations become more wordy. Although I flag that would be about copy vs. link may be immensely useful.

kylebgorman commented 4 years ago

Yes, I think the generalization is:

K

On Fri, Feb 28, 2020 at 12:03 PM Alexander Kain notifications@github.com wrote:

If tables were copied previously, then I can see that operations like:

    tmp = (ni.t(ni.a(p, token_type=self.phoneme_table),
                ni.a(w, token_type=self.word_table),
                0
                ) for w, p in self.pronunciation.items())  # generator
    fst = ni.union(*tmp)
    fst.set_input_symbols(self.phoneme_table)
    fst.set_output_symbols(self.word_table)

are definitely benefiting from this decision, even if simple situations become more wordy. Although I flag that would be about copy vs. link may be immensely useful.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/kylebgorman/pynini/issues/22?email_source=notifications&email_token=AABG4OKCMJ2DNVE6JYXZG2DRFE7X3A5CNFSM4K4NGHP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENJHR7Y#issuecomment-592607487, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABG4OLERU2LH7ZCDCP5FUDRFE7X3ANCNFSM4K4NGHPQ .