Grammar-guided recognition

LoyVanBeek commented 4 years ago

Going from open-world to closed-world speech should improve the rate at which sentences can be parsed.

LoyVanBeek commented 4 years ago

My plan:

Set up a basic Grammar base-class in Yapykaldi that provides hooks and interfaces for any grammar sort. Only method called by Yapykaldi seems to be
- traverse(recognised_word: str) -> bool indicating if the word matches the grammar (and if so, adds the word to the sentence gathered so far
Do a full implementation in yapykaldi_ros, that works with the grammar_parser.CFGrammar
- Extends traverse to record the words heard so far
- get_results() -> sentence: str, semantics: dict for getting the result of the completed sentence

LoyVanBeek commented 4 years ago

Some info links:

ar13pit commented 4 years ago

@LoyVanBeek Where is the traverse method defined in Yapykaldi?

I didn't clearly understand your plan.

ar13pit commented 4 years ago

I did checkout the kaldi-active-grammar repo a while back. But the project was a huge mess with no way of knowing where to start with, which it still is as it was created to support Dragonfly.

We will have to merge the functionalities provided by zamia speech JSGF and kaldi-active-gramar repo. I already began my dive into the source code of the first, so I will continue with that to get at least 1 sample grammar file working, only then move to kaldi-active-grammar.

LoyVanBeek commented 4 years ago

@LoyVanBeek Where is the traverse method defined in Yapykaldi?

https://github.com/tue-robotics/speech_recognition/blob/fix/autocomplete/src/speech_recognition/kaldi_gstreamer_app.py#L82 So, not in yapykaldi yet, but I was figuring out how to set this all up.

But: I fully expect Kaldi to have some sort of internal language model representation, eg. as a HMM of words or an FST that we should provide it, right? Possibly with equally distributed weights etc.

Otherwise, I have no idea what Kaldi or Pypykaldi expect and how this functionality is supposed to work. I need a rough sketch of what you expect of how this will all work, really.

ar13pit commented 4 years ago

So the traverse method you see in that script traverses a graph of the grammar constructed in the same script. This is already very close to Kaldi's FST representation. Kaldi uses OpenFST for the internal model representation so to bring our grammar graph into Kaldi's format we have to patch our script to the python API of OpenFST.

I'll have to enable the compilation of the python wrappers of OpenFST in the compilation process of our fork of Kaldi.

After that we will be able to manipulate the weights of FSTs, compose them, etc.

ar13pit commented 4 years ago

The main question is how tightly do we want to couple Yapykaldi and Yapykaldi_ros. IMO we should keep the ROS wrapper as light as possible. So anything related to OpenFST, Kaldi internal API is only restricted to Yapykaldi and we provide a thin wrapper around them if needed in the ROS wrapper.

LoyVanBeek commented 4 years ago

Yes, I think Yapykaldi should define a class (eg. FST or Grammar or LanguageModel) and yapykaldi_ros instantiates a (sub)class.

Then the questions are:

What is that class and it's interface. My guess was that Grammar has a method traverse(word: str) -> X and wraps around an FST class exposed by yapykaldi. Shouldtraverse` return a float indicating the probability of that word following the preceding words of the sentence?

ar13pit commented 4 years ago

We can simplify it more. Have the Grammar object an arg to the Asr class. Subclass the Grammar class in yapykaldi_ros, add a method like construct_fst(grammar_str: str) and pass this object as an arg to the Asr class in yapykaldi_ros.

We could either have this function as a independent interface, in which case it will return an instance of Grammar class, or an abstract method of Grammar class which we define in yapykaldi_ros.

The traversal during recognition will happen internally in the decoding process, we don't have to do that ourselves. The output of the recognition, which will just be a sentence, needs to be parsed back for semantics.

For assigning probabilities to the transitions in Grammar object, for now we can program it to be equal for all arcs in the graph.

LoyVanBeek commented 4 years ago

I think we agree :-)

ar13pit commented 4 years ago

Great. I can finally make some time for this tonight :)

On Fri, May 15, 2020 at 3:53 PM Loy notifications@github.com wrote:

I think we agree :-)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tue-robotics/yapykaldi/issues/6#issuecomment-629248615, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACX636562APGNLX5GS4RYA3RRVCOHANCNFSM4M7BXLMA .

LoyVanBeek commented 4 years ago

The traversal during recognition will happen internally in the decoding process, we don't have to do that ourselves. The output of the recognition, which will just be a sentence, needs to be parsed back for semantics.

How does this traverse method get called?

ar13pit commented 4 years ago

How all this will come together:

We use the graph construction mechanism used in the traverse method to construct a probabilistic FST of our grammar in yapykaldi_ros using a FST/Grammar class exposed from yapykaldi.
Pass this constructed FST to the start method of Asr.
In the Asr class we add some code to take in a Grammar FST as input (if available) and compose it with the underlying nnet3/gmm model before the recognition actually begins.
Upon composition, the recognition happens in closed grammar and the output is a sentence from our grammar which needs to be parsed back for semantics.

@LoyVanBeek feel free to edit this comment if you feel something is missing.

LoyVanBeek commented 4 years ago

Great, this is something I can work with I think. I'll read up on FSTs, how we can create one. If you expose a class for FSTs from yapykaldi, I'll (try to) set up the machinery to instantiate an FST in yapykaldi_ros.

ar13pit commented 4 years ago

Kaldi uses openfst (http://www.openfst.org/twiki/bin/view/FST/WebHome), so all I'll be doing is exposing this library's python interface as a thin wrapper. You probably can already start by looking into its python API.

LoyVanBeek commented 4 years ago

For that, there is:

ar13pit commented 4 years ago

Yup. You could use either of them, I'll add a patch to our kaldi build process as Kaldi clones a version lower than the latest of OpenFst, so compilation of wrapper needs to happen on our side.

LoyVanBeek commented 4 years ago

Made some progress but need more work. I can't figure out yet how to put string labels on fst.Arcs GallicArc like in http://openfst.org/twiki/bin/view/FST/FstAdvancedUsage#FstArcs seems to be it, but I don't know yet why Asterix and Obelix have to come in the picture there...

Edit: I've made some changes to https://github.com/tue-robotics/speech_recognition/pull/30. This is very much not the right place, but for experimenting it works. The necessary bits willl have to be moved to the yapykaldi_ros repo eventually

ar13pit commented 4 years ago

I'm not sure where they are troubling you. Maybe once you push what you did I can look into where what must come.

LoyVanBeek commented 4 years ago

For constructing a graph with strings n the edges, I tried:


>>> import openfst_python as fst
>>> fst.Fst(arc_type='string')
ERROR: GenericRegister::GetEntry: string-arc.so: cannot open shared object file: No such file or directory
ERROR: CreateFstClass: Unknown arc type: string
---------------------------------------------------------------------------
FstOpError                                Traceback (most recent call last)
<ipython-input-34-692f6669f752> in <module>()
----> 1 f2 = fst.Fst(arc_type='string')
pywrapfst.pyx in pywrapfst.Fst.__new__()
pywrapfst.pyx in pywrapfst._create_Fst()
pywrapfst.pyx in pywrapfst._init_MutableFst()
FstOpError: Operation failed

>>> fst.Fst(arc_type=fst.STRING)
pywrapfst.pyx in pywrapfst.Fst.__new__()
pywrapfst.pyx in pywrapfst._create_Fst()
pywrapfst.pyx in pywrapfst.tostring()
FstArgError: Cannot encode as string: 17592186044416L

>>> fst.Fst(arc_type=str)
---------------------------------------------------------------------------
FstArgError                               Traceback (most recent call last)
<ipython-input-2-d13e058189ef> in <module>()
----> 1 fst.Fst(arc_type=str)
pywrapfst.pyx in pywrapfst.Fst.__new__()
pywrapfst.pyx in pywrapfst._create_Fst()
pywrapfst.pyx in pywrapfst.tostring()
FstArgError: Cannot encode as string: <type 'str'>

>>> fst.Fst(arc_type='str')
ERROR: GenericRegister::GetEntry: str-arc.so: cannot open shared object file: No such file or directory
ERROR: CreateFstClass: Unknown arc type: str
---------------------------------------------------------------------------
FstOpError                                Traceback (most recent call last)
<ipython-input-3-e5a467fc678f> in <module>()
----> 1 fst.Fst(arc_type='str')
pywrapfst.pyx in pywrapfst.Fst.__new__()
pywrapfst.pyx in pywrapfst._create_Fst()
pywrapfst.pyx in pywrapfst._init_MutableFst()
FstOpError: Operation failed

tue-robotics-graveyard / yapykaldi

Grammar-guided recognition #6