kylebgorman / pynini

Read-only mirror of Pynini
http://pynini.opengrm.org
Apache License 2.0
118 stars 27 forks source link

Minimization introduces more states and epsilon transition #67

Closed namednil closed 11 months ago

namednil commented 1 year ago

Hi! I'm thoroughly confused about this one. I'm trying to minimize a small FST. However, the FST actually gets bigger in the number of states and includes a useless epsilon transition (first insert, then delete).

import pynini

fst_str = b'\xd6\xfd\xb2~\x06\x00\x00\x00vector\x08\x00\x00\x00standard\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x80\x02\x02\x05\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x80\x7f\x01\x00\x00\x00\x00\x00\x00\x00i\x00\x00\x00i\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00i\x00\x00\x00q\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00q\x00\x00\x00x\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00x\x00\x00\x00x\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00m\x00\x00\x00q\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'

fst = pynini.Fst.read_from_string(fst_str)

print("# states", fst.num_states())
fst.draw("fst_before.dot", portrait=True)

fst.minimize()

print("# states", fst.num_states())
fst.draw("fst_after.dot", portrait=True)

With output

# states 3
# states 5

These are the graphs of the two FSTs.

Before: fst_before

After: fst_after

I'm using Pynini 2.1.5 on Ubuntu.

Any idea what might be going on?

kylebgorman commented 1 year ago

That is mysterious to me…in the sense that I don’t see anything wrong with the original FST and it’s not obvious to me why it got bigger. I suspect the issue has to be in the minimization itself: perhaps it doesn’t detect in this case that the machine is already minimal (as it looks to be). I’ll kick this back to our experts and may need some time.

K

On Jun 20, 2023, at 9:35 AM, Matthias Lindemann @.***> wrote:

Hi! I'm thoroughly confused about this one. I'm trying to minimize a small FST. However, the FST actually gets bigger in the number of states and includes a useless epsilon transition (first insert, then delete).

import pynini

fst_str = b'\xd6\xfd\xb2~\x06\x00\x00\x00vector\x08\x00\x00\x00standard\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x80\x02\x02\x05\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x80\x7f\x01\x00\x00\x00\x00\x00\x00\x00i\x00\x00\x00i\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00i\x00\x00\x00q\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00q\x00\x00\x00x\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00x\x00\x00\x00x\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00m\x00\x00\x00q\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'

fst = pynini.Fst.read_from_string(fst_str)

print("# states", fst.num_states()) fst.draw("fst_before.dot", portrait=True)

fst.minimize()

print("# states", fst.num_states()) fst.draw("fst_after.dot", portrait=True) With output

states 3

states 5

These are the graphs of the two FSTs.

Before: https://user-images.githubusercontent.com/19576708/247130849-f78a8711-3cdf-4375-bab5-69884c657656.png After: https://user-images.githubusercontent.com/19576708/247130977-51336425-0d5a-41ad-a10c-160c5a26a435.png I'm using Pynini 2.1.5 on Ubuntu.

Any idea what might be going on?

— Reply to this email directly, view it on GitHub https://github.com/kylebgorman/pynini/issues/67, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABG4OOQXLECUGIYG67XW73XMGRJ7ANCNFSM6AAAAAAZNJV3MA. You are receiving this because you are subscribed to this thread.

kylebgorman commented 1 year ago

Since this issue is true of the lower-level C++ interfaces too it is not strictly a Pynini bug. However, we (= OpenFst developers) are curently stumped about the insertion of epsilons ourselves and will have to keep looking into it.

namednil commented 1 year ago

Thank you for the update! I found this by generating FSTs automatically by the way. Maybe it's worth looking into whether fuzzing can uncover any more issues?

kylebgorman commented 1 year ago

I agree---fuzz-testing is a great idea for the library. We do a form of this for OpenFst itself (internally to Google), though I don't know if the test also targets minimization.

kylebgorman commented 11 months ago

Closing as #wontfix even though we're looking into this upstream.