Open LoyVanBeek opened 4 years ago
My plan:
Grammar
base-class in Yapykaldi that provides hooks and interfaces for any grammar sort. Only method called by Yapykaldi seems to be
traverse(recognised_word: str) -> bool
indicating if the word matches the grammar (and if so, adds the word to the sentence gathered so faryapykaldi_ros
, that works with the grammar_parser.CFGrammar
traverse
to record the words heard so farget_results() -> sentence: str, semantics: dict
for getting the result of the completed sentence@LoyVanBeek Where is the traverse
method defined in Yapykaldi?
I didn't clearly understand your plan.
I did checkout the kaldi-active-grammar repo a while back. But the project was a huge mess with no way of knowing where to start with, which it still is as it was created to support Dragonfly.
We will have to merge the functionalities provided by zamia speech JSGF and kaldi-active-gramar repo. I already began my dive into the source code of the first, so I will continue with that to get at least 1 sample grammar file working, only then move to kaldi-active-grammar.
@LoyVanBeek Where is the traverse method defined in Yapykaldi?
https://github.com/tue-robotics/speech_recognition/blob/fix/autocomplete/src/speech_recognition/kaldi_gstreamer_app.py#L82 So, not in yapykaldi yet, but I was figuring out how to set this all up.
But: I fully expect Kaldi to have some sort of internal language model representation, eg. as a HMM of words or an FST that we should provide it, right? Possibly with equally distributed weights etc.
Otherwise, I have no idea what Kaldi or Pypykaldi expect and how this functionality is supposed to work. I need a rough sketch of what you expect of how this will all work, really.
So the traverse method you see in that script traverses a graph of the grammar constructed in the same script. This is already very close to Kaldi's FST representation. Kaldi uses OpenFST for the internal model representation so to bring our grammar graph into Kaldi's format we have to patch our script to the python API of OpenFST.
I'll have to enable the compilation of the python wrappers of OpenFST in the compilation process of our fork of Kaldi.
After that we will be able to manipulate the weights of FSTs, compose them, etc.
The main question is how tightly do we want to couple Yapykaldi and Yapykaldi_ros. IMO we should keep the ROS wrapper as light as possible. So anything related to OpenFST, Kaldi internal API is only restricted to Yapykaldi and we provide a thin wrapper around them if needed in the ROS wrapper.
Yes, I think Yapykaldi should define a class (eg. FST
or Grammar
or LanguageModel
) and yapykaldi_ros
instantiates a (sub)class.
Then the questions are:
Grammar
has a method traverse(word: str) -> X
and wraps around an FST class exposed by yapykaldi. Should
traverse` return a float indicating the probability of that word following the preceding words of the sentence?We can simplify it more. Have the Grammar
object an arg to the Asr
class. Subclass the Grammar
class in yapykaldi_ros
, add a method like construct_fst(grammar_str: str)
and pass this object as an arg to the Asr class in yapykaldi_ros
.
We could either have this function as a independent interface, in which case it will return an instance of Grammar
class, or an abstract method of Grammar
class which we define in yapykaldi_ros
.
The traversal during recognition will happen internally in the decoding process, we don't have to do that ourselves. The output of the recognition, which will just be a sentence, needs to be parsed back for semantics.
For assigning probabilities to the transitions in Grammar
object, for now we can program it to be equal for all arcs in the graph.
I think we agree :-)
Great. I can finally make some time for this tonight :)
On Fri, May 15, 2020 at 3:53 PM Loy notifications@github.com wrote:
I think we agree :-)
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tue-robotics/yapykaldi/issues/6#issuecomment-629248615, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACX636562APGNLX5GS4RYA3RRVCOHANCNFSM4M7BXLMA .
The traversal during recognition will happen internally in the decoding process, we don't have to do that ourselves. The output of the recognition, which will just be a sentence, needs to be parsed back for semantics.
How does this traverse
method get called?
How all this will come together:
traverse
method to construct a probabilistic FST of our grammar in yapykaldi_ros
using a FST/Grammar class exposed from yapykaldi
.start
method of Asr
.Asr
class we add some code to take in a Grammar FST as input (if available) and compose it with the underlying nnet3
/gmm
model before the recognition actually begins.@LoyVanBeek feel free to edit this comment if you feel something is missing.
Great, this is something I can work with I think.
I'll read up on FSTs, how we can create one. If you expose a class for FSTs from yapykaldi
, I'll (try to) set up the machinery to instantiate an FST in yapykaldi_ros
.
Kaldi uses openfst (http://www.openfst.org/twiki/bin/view/FST/WebHome), so all I'll be doing is exposing this library's python interface as a thin wrapper. You probably can already start by looking into its python API.
Yup. You could use either of them, I'll add a patch to our kaldi build process as Kaldi clones a version lower than the latest of OpenFst, so compilation of wrapper needs to happen on our side.
Made some progress but need more work.
I can't figure out yet how to put string labels on fst.Arc
s
GallicArc
like in http://openfst.org/twiki/bin/view/FST/FstAdvancedUsage#FstArcs seems to be it, but I don't know yet why Asterix and Obelix have to come in the picture there...
Edit: I've made some changes to https://github.com/tue-robotics/speech_recognition/pull/30. This is very much not the right place, but for experimenting it works. The necessary bits willl have to be moved to the yapykaldi_ros
repo eventually
I'm not sure where they are troubling you. Maybe once you push what you did I can look into where what must come.
For constructing a graph with strings n the edges, I tried:
>>> import openfst_python as fst
>>> fst.Fst(arc_type='string')
ERROR: GenericRegister::GetEntry: string-arc.so: cannot open shared object file: No such file or directory
ERROR: CreateFstClass: Unknown arc type: string
---------------------------------------------------------------------------
FstOpError Traceback (most recent call last)
<ipython-input-34-692f6669f752> in <module>()
----> 1 f2 = fst.Fst(arc_type='string')
pywrapfst.pyx in pywrapfst.Fst.__new__()
pywrapfst.pyx in pywrapfst._create_Fst()
pywrapfst.pyx in pywrapfst._init_MutableFst()
FstOpError: Operation failed
>>> fst.Fst(arc_type=fst.STRING)
pywrapfst.pyx in pywrapfst.Fst.__new__()
pywrapfst.pyx in pywrapfst._create_Fst()
pywrapfst.pyx in pywrapfst.tostring()
FstArgError: Cannot encode as string: 17592186044416L
>>> fst.Fst(arc_type=str)
---------------------------------------------------------------------------
FstArgError Traceback (most recent call last)
<ipython-input-2-d13e058189ef> in <module>()
----> 1 fst.Fst(arc_type=str)
pywrapfst.pyx in pywrapfst.Fst.__new__()
pywrapfst.pyx in pywrapfst._create_Fst()
pywrapfst.pyx in pywrapfst.tostring()
FstArgError: Cannot encode as string: <type 'str'>
>>> fst.Fst(arc_type='str')
ERROR: GenericRegister::GetEntry: str-arc.so: cannot open shared object file: No such file or directory
ERROR: CreateFstClass: Unknown arc type: str
---------------------------------------------------------------------------
FstOpError Traceback (most recent call last)
<ipython-input-3-e5a467fc678f> in <module>()
----> 1 fst.Fst(arc_type='str')
pywrapfst.pyx in pywrapfst.Fst.__new__()
pywrapfst.pyx in pywrapfst._create_Fst()
pywrapfst.pyx in pywrapfst._init_MutableFst()
FstOpError: Operation failed
Going from open-world to closed-world speech should improve the rate at which sentences can be parsed.