Closed david-waterworth closed 2 years ago
This is all expected behavior. .input_symbols and .output_symbols return
None
because no symbol table is attached. You have to make and/or attach
your own symbol tables if you really want them. I never bother except for
publication purposes, since the library ignores symbol tables at nearly all
levels of abstraction. Those graphics were made with additional effort
(which in this case involves creating appropriate symbol tables, setting
the image sizes and orientations etc. as arguments to .draw).
On Fri, Jul 15, 2022 at 10:37 PM David Waterworth @.***> wrote:
In all the examples FST's are rendered with text arc labels, for example
[image: image] https://user-images.githubusercontent.com/5028974/179335424-8f97d89f-b05a-422d-bf0a-f0d5497c9717.png
From String processing with Pynini edit transducers https://gist.github.com/kylebgorman/7d406f577ef1922b2dd3a5ac52752dea
However when I try and replicate (updated due to api change) I get ordinals for the labels and the rendering is all messed up
insert = pynutil.add_weight( pynini.cross("", match), 1).optimize(True) insert
[image: image] https://user-images.githubusercontent.com/5028974/179335497-f89864d1-f837-4f54-8166-64ba7866a16c.png
I get better rendering if I create a dot file and render using dot, i.e.
insert.draw("tmp.dot", portrait=True)
[image: image] https://user-images.githubusercontent.com/5028974/179335538-18b41020-1988-408b-a6bf-c303694bb3be.png
But the labels are still displayed using their ordinal value rather than text.
I tried
insert.draw("tmp.dot", portrait=True, isymbols=insert.input_symbols(), osymbols=insert.output_symbols())
But this didn't change the results, i.e. fact input_symbols() and output_symbols() always seems to return Null
type(insert.input_symbols())
NoneType
I'm using OpenFst 1.8.2 with pynini 2.1.5 and seeing the same behavior on Ubuntu (installed via pip) and MacOs (installed via conda-forge).
I've tried running the tests from this repo and they all suceed so I'm assuming I've installed OpenFst correctly. Is there something else I should be doing here?
— Reply to this email directly, view it on GitHub https://github.com/kylebgorman/pynini/issues/55, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABG4OPV4L6BVI6NGUHE4VDVUIN5VANCNFSM53XKJYIA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Do you have a simple example of that you can point me at? I'm not 100% sure how to create a symbol table but I think this may be correct?
syms = pynini.SymbolTable()
syms.add_symbol("<epsilon>")
syms.add_symbol("a", ord("a"))
syms.add_symbol("b", ord("b"))
But then how to attach it to an Fst? I found comments in the README from the release 2.0.9
input_token_type
and output_token_type
(as token_type
) and
attach_input_symbols
and attach_output_symbols
(as attach_symbols
)
in transducer
But I cannot find attach_symbols
in the library?
This does work nicely though - once I found the acceptor
parameter. It's easy enough to wrap this in a draw
function if it's the only way
import graphviz
from IPython.display import Image
fst.draw("tmp.dot", portrait=True, isymbols=syms, osymbols=syms, acceptor=True)
graphviz.render('dot', 'png', 'tmp.dot', renderer='cairo')
Image(filename='tmp.dot.cairo.png')
That's a perfectly good way to create a symbol table. If I'm targeting all printable ASCII, or all bytes, I'd probably just have made one ahead of time (as a two-column TSV file) and I'd load it in with pynini.SymbolTable.read_text
instead.
You don't have to attach it to an FST if you don't want to---you can just pass the table to .draw
using the kwargs isymbols=
and osymbols=
. If you want to attach it you call the setter methods .set_input_symbols
and .set_output_symbols
. That is really just a mechanism for metadata storage, though the Jupyter drawing "magic" should respect it too.
When teaching, I encourage students to just learn to read the ASCII byte numbers for toy examples. As I say, this is probably not the only time they'll be asked to understand that characters are to the computer just integers, and you can use ord
and chr
to convert between.
Thanks, I could get .set_input_symbols
and .set_output_symbols
to work though - at least not with Jupyter in vscode.
Without symbols it plots as expected
sheep:pynini.Fst = pynini.accep("b") + pynini.accep("a").plus
sheep.optimize(True)
import pywrapfst
alphabet = list('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz')
syms = pywrapfst.SymbolTable()
syms.add_symbol('<epsilon>', 0)
for symb in alphabet:
syms.add_symbol(symb, ord(symb))
sheep.set_input_symbols(syms)
It's changed 98 (b) to 9, 97 (a) to 2, and relabelled the final state?
Just to be sure I used token_type
but that didn't help, i.e.
sheep:pynini.Fst = pynini.accep("b", token_type=syms) + pynini.accep("a", token_type=syms).plus
sheep.optimize(True)
I'm happy to use fst.draw
though as that seems to work fine.
This displays a string FSA labeled "f o o":
import string
import pynini
x = pynini.accep("foo")
sym = pynini.SymbolTable()
sym.add_symbol("<epsilon>")
for char in string.ascii_lowercase:
sym.add_symbol(char, ord(char))
x.set_input_symbols(sym)
x.set_output_symbols(sym)
I think you need to set the output symbols. The convention is that in case of a FST viewed as an acceptor the output symbol table is taken to be "the" symbol table.
Using a symbol table to parse the string as in token_type=
does not auto-attach the symbol table. Symbol tables can get quite large and so we don't want to go around making arbitrary numbers of copies of them without explicit user consent.
It seems to work / not work intermittently, I've attached a screenshot where the first example works correctly but the next two don't
It seems to be something weird with jupyter
, maybe vscode
. If I Clear Output of All Cells
then it starts to display correctly? I'd already tried a kernel restart. I suspect this isn't a pynini issue.
I am marking this closed as I don't intend to debug the Jupyter behavior.
In all the examples FST's are rendered with text arc labels, for example
From String processing with Pynini edit transducers
However when I try and replicate (updated due to api change) I get ordinals for the labels and the rendering is all messed up
I get better rendering if I create a dot file and render using dot, i.e.
But the labels are still displayed using their ordinal value rather than text.
I tried
But this didn't change the results, i.e. fact
input_symbols()
andoutput_symbols()
always seems to return NullI'm using OpenFst 1.8.2 with pynini 2.1.5 and seeing the same behavior on Ubuntu (installed via pip) and MacOs (installed via conda-forge).
I've tried running the tests from this repo and they all suceed so I'm assuming I've installed OpenFst correctly. Is there something else I should be doing here?