Optimize duplicate subgraphs in FST

m0r0zk01 commented 1 year ago

Hello!

I'm working on a project and use Pynini/OpenFST to build (in general, non-deterministic) WFSTs. Then I apply input strings to them by using composition operation and then get top-n matches.

Everything works fine, but I'd like to optimize the response time. The query response time is proportional to the number of states and arcs in the FST which is very big.

My automatons have a structure similar to the following:

first

In this picture all letters 'A' replace some same FST. Summing up: my graph has a lot of equal subgraphs. In my case 'A' has around 5000 states. Thus, if I use 'A' 10 times then my final graph has 50000+ states. Of course I use .optimize() method but it doesn't help much.

A nice solution would be to have only one 'A' in the graph and use that one instance everywhere:

second

The problem is: if I traverse an arc(edge) leading to 'A', I don't know where to return after 'A' is finished. In the picture above: if I enter 'A' using the blue arc, I then want to also use a blue arc to return from 'A' to where I was.

Unfortunately, it's very difficult for me to modify the input strings, so I would like to only change FSTs. My thoughts on the solution:

FST can have some kind of internal memory to store a flag(that changes during graph traversal) that defines what 'colored' arc I should choose next
I've read that I can create custom arc types/arc filters/arc matchers with OpenFST. Maybe I could use them somehow?

Do you think I can implement it using Pynini & OpenFST? I would appreciate if you could help me to come up with a possible solution. Thanks!

kylebgorman commented 1 year ago

I would first recommend reading a bit about composition filters, in case that helps:FstAdvancedUsage < FST < TWikiopenfst.orgFilters for Efficient Composition of Weighted Finite-State Transducerslink.springer.comMost if not all of these are exposed by pynini.compose using a keyword argument. However it’s not really possible to make a new one and “register” it so that Pynini can see it.Some advanced features are going to require you to go to C++.Your “equal subgraph” concept sounds like you’re describing a recursive transition network (RTN). OpenFst has two relevant pieces for this: fst::Replace (and its Pynini equivalent) can compile an RTN into a finite automaton (assuming the RTN is finite: not all are). But more useful is fst::ReplaceFst (there is no Pynini equivalent nor can I even imagine one), in which you provide an RTN specification (which looks a bit like a CFG specification, except the right hand sides of each rule are FSTs) and it builds the equivalent FST on the fly (you can control the caching, too). This will allow you (maybe) to represent your rule more compactly. I have no idea if that will help keep your lattices small though. Your idea of a colored arc seems like you’re describing pushdown transducers. Pynini has good support for compiling pushdown transducers and can compute PDT-FST composition and PDT shortest path. See the functions named pynini.pdt_* and read the following:PdtExtension < FST < TWikiopenfst.orghttps://www.openfst.org/twiki/pub/FST/PdtExtension/ciaa12.pdfHopefully this helps!KOn Dec 21, 2022, at 11:35 AM, Ivan Morozov @.***> wrote: Hello! I'm working on a project and use Pynini/OpenFST to build (in general, non-deterministic) WFSTs. Then I apply input strings to them by using composition operation and then get top-n matches. Everything works fine, but I'd like to optimize the response time. The query response time is proportional to the number of states and arcs in the FST which is very big. My automatons have a structure similar to the following:

In this picture all letters 'A' replace some same FST. Summing up: my graph has a lot of equal subgraphs. In my case 'A' has around 5000 states. Thus, if I use 'A' 10 times then my final graph has 50000+ states. Of course I use .optimize() method but it doesn't help much. A nice solution would be to have only one 'A' in the graph and use that one instance everywhere:

The problem is: if I traverse an arc(edge) leading to 'A', I don't know where to return after 'A' is finished. In the picture above: if I enter 'A' using the blue arc, I then want to also use a blue arc to return from 'A' to where I was. Unfortunately, it's very difficult for me to modify the input strings, so I would like to only change FSTs. My thoughts on the solution:

FST can have some kind of internal memory to store a flag(that changes during graph traversal) that defines what 'colored' arc I should choose next I've read that I can create custom arc types/arc filters/arc matchers with OpenFST. Maybe I could use them somehow?

Do you think I can implement it using Pynini & OpenFST? I would appreciate if you could help me to come up with a possible solution. Thanks!

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.***>

m0r0zk01 commented 1 year ago

Thanks! I'll check it out.

kylebgorman commented 1 year ago

Going to close this for no activity. If you find a bug, reopen.

kylebgorman / pynini

Optimize duplicate subgraphs in FST #63