kylebgorman / pynini

Read-only mirror of Pynini
http://pynini.opengrm.org
Apache License 2.0
118 stars 27 forks source link

Fst Rendering of arc labels #55

Closed david-waterworth closed 1 year ago

david-waterworth commented 2 years ago

In all the examples FST's are rendered with text arc labels, for example

image

From String processing with Pynini edit transducers

However when I try and replicate (updated due to api change) I get ordinals for the labels and the rendering is all messed up

insert = pynutil.add_weight(
    pynini.cross("", match), 1).optimize(True)
insert

image

I get better rendering if I create a dot file and render using dot, i.e.

insert.draw("tmp.dot", portrait=True)

image

But the labels are still displayed using their ordinal value rather than text.

I tried

insert.draw("tmp.dot", portrait=True, isymbols=insert.input_symbols(), osymbols=insert.output_symbols())

But this didn't change the results, i.e. fact input_symbols() and output_symbols() always seems to return Null

type(insert.input_symbols())

NoneType

I'm using OpenFst 1.8.2 with pynini 2.1.5 and seeing the same behavior on Ubuntu (installed via pip) and MacOs (installed via conda-forge).

I've tried running the tests from this repo and they all suceed so I'm assuming I've installed OpenFst correctly. Is there something else I should be doing here?

kylebgorman commented 2 years ago

This is all expected behavior. .input_symbols and .output_symbols return None because no symbol table is attached. You have to make and/or attach your own symbol tables if you really want them. I never bother except for publication purposes, since the library ignores symbol tables at nearly all levels of abstraction. Those graphics were made with additional effort (which in this case involves creating appropriate symbol tables, setting the image sizes and orientations etc. as arguments to .draw).

On Fri, Jul 15, 2022 at 10:37 PM David Waterworth @.***> wrote:

In all the examples FST's are rendered with text arc labels, for example

[image: image] https://user-images.githubusercontent.com/5028974/179335424-8f97d89f-b05a-422d-bf0a-f0d5497c9717.png

From String processing with Pynini edit transducers https://gist.github.com/kylebgorman/7d406f577ef1922b2dd3a5ac52752dea

However when I try and replicate (updated due to api change) I get ordinals for the labels and the rendering is all messed up

insert = pynutil.add_weight( pynini.cross("", match), 1).optimize(True) insert

[image: image] https://user-images.githubusercontent.com/5028974/179335497-f89864d1-f837-4f54-8166-64ba7866a16c.png

I get better rendering if I create a dot file and render using dot, i.e.

insert.draw("tmp.dot", portrait=True)

[image: image] https://user-images.githubusercontent.com/5028974/179335538-18b41020-1988-408b-a6bf-c303694bb3be.png

But the labels are still displayed using their ordinal value rather than text.

I tried

insert.draw("tmp.dot", portrait=True, isymbols=insert.input_symbols(), osymbols=insert.output_symbols())

But this didn't change the results, i.e. fact input_symbols() and output_symbols() always seems to return Null

type(insert.input_symbols())

NoneType

I'm using OpenFst 1.8.2 with pynini 2.1.5 and seeing the same behavior on Ubuntu (installed via pip) and MacOs (installed via conda-forge).

I've tried running the tests from this repo and they all suceed so I'm assuming I've installed OpenFst correctly. Is there something else I should be doing here?

— Reply to this email directly, view it on GitHub https://github.com/kylebgorman/pynini/issues/55, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABG4OPV4L6BVI6NGUHE4VDVUIN5VANCNFSM53XKJYIA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

david-waterworth commented 2 years ago

Do you have a simple example of that you can point me at? I'm not 100% sure how to create a symbol table but I think this may be correct?

syms = pynini.SymbolTable()
syms.add_symbol("<epsilon>")
syms.add_symbol("a", ord("a"))
syms.add_symbol("b", ord("b"))

But then how to attach it to an Fst? I found comments in the README from the release 2.0.9

But I cannot find attach_symbols in the library?

This does work nicely though - once I found the acceptor parameter. It's easy enough to wrap this in a draw function if it's the only way

import graphviz
from IPython.display import Image

fst.draw("tmp.dot", portrait=True, isymbols=syms, osymbols=syms, acceptor=True)
graphviz.render('dot', 'png', 'tmp.dot', renderer='cairo')
Image(filename='tmp.dot.cairo.png') 
kylebgorman commented 2 years ago

That's a perfectly good way to create a symbol table. If I'm targeting all printable ASCII, or all bytes, I'd probably just have made one ahead of time (as a two-column TSV file) and I'd load it in with pynini.SymbolTable.read_text instead.

You don't have to attach it to an FST if you don't want to---you can just pass the table to .draw using the kwargs isymbols= and osymbols=. If you want to attach it you call the setter methods .set_input_symbols and .set_output_symbols. That is really just a mechanism for metadata storage, though the Jupyter drawing "magic" should respect it too.

When teaching, I encourage students to just learn to read the ASCII byte numbers for toy examples. As I say, this is probably not the only time they'll be asked to understand that characters are to the computer just integers, and you can use ord and chr to convert between.

david-waterworth commented 2 years ago

Thanks, I could get .set_input_symbols and .set_output_symbols to work though - at least not with Jupyter in vscode.

Without symbols it plots as expected

sheep:pynini.Fst = pynini.accep("b") + pynini.accep("a").plus
sheep.optimize(True)

image

import pywrapfst

alphabet = list('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz')
syms = pywrapfst.SymbolTable()
syms.add_symbol('<epsilon>', 0)
for symb in alphabet:
  syms.add_symbol(symb, ord(symb))

sheep.set_input_symbols(syms)

image

It's changed 98 (b) to 9, 97 (a) to 2, and relabelled the final state?

Just to be sure I used token_type but that didn't help, i.e.

sheep:pynini.Fst = pynini.accep("b", token_type=syms) + pynini.accep("a", token_type=syms).plus
sheep.optimize(True)

I'm happy to use fst.draw though as that seems to work fine.

kylebgorman commented 2 years ago

This displays a string FSA labeled "f o o":

import string

import pynini

x = pynini.accep("foo")
sym = pynini.SymbolTable()
sym.add_symbol("<epsilon>")
for char in string.ascii_lowercase:
    sym.add_symbol(char, ord(char))
x.set_input_symbols(sym)
x.set_output_symbols(sym)

I think you need to set the output symbols. The convention is that in case of a FST viewed as an acceptor the output symbol table is taken to be "the" symbol table.

Using a symbol table to parse the string as in token_type= does not auto-attach the symbol table. Symbol tables can get quite large and so we don't want to go around making arbitrary numbers of copies of them without explicit user consent.

david-waterworth commented 2 years ago

It seems to work / not work intermittently, I've attached a screenshot where the first example works correctly but the next two don't

It seems to be something weird with jupyter, maybe vscode. If I Clear Output of All Cells then it starts to display correctly? I'd already tried a kernel restart. I suspect this isn't a pynini issue.

image

kylebgorman commented 1 year ago

I am marking this closed as I don't intend to debug the Jupyter behavior.