gooofy / zamia-speech

Open tools and data for cloudless automatic speech recognition
GNU Lesser General Public License v3.0
443 stars 86 forks source link

Model Adaption #93

Closed jbaudisch closed 1 year ago

jbaudisch commented 4 years ago

Hi!

I used the pretrained model _kaldi-generic-de-tdnnf-r20190328 and used the following JSGF-File:

#JSGF V1.0;
grammar csra.nlp.isaac;

public <control> = <ai-name> [<polite-do>] <command> | [<polite-do>] <command> <ai-name>;

<ai-name> = (isaac | isaak);
<polite-do> = (bitte mach | mach bitte);
<german-article> = (der | die | das | den);
<command> = [<german-article>] <device> <operation>;

<device> = (lampe | fernseher | radio);
<operation> = (an | aus);

Then I used the kaldi_decode_live.py script to test it.

And now it tries to fit every sentence into the given grammar. For example if I say the german sentence: "Guten Morgen, wie geht es dir?" It recognizes "isaac bitte mach die lampe an".

But my intention was just to adapt the model so the recognition could recognize both sentences and not to overwrite the whole model (?).

besimali commented 4 years ago

Hey, I am not an expert but I think I can help you out.

That makes sense, as the language model does not contain out of domain examples and by adapting the pre trained model you actually replace the complete LM. Your language model consists only out of strict combination of words that you specified in JSGF. Which dictionary file are you using? I guess if you use the general german dictionary there could be some smoothing going on to include the words that are in the vocabulary but not in the language model. But the probability of these words would be lower in this case. You can try to ramp up the acoustic scale in your script to try to get some combinations of words outside of the JSGF file.

However, this kind of LM can only be good for recognizing the exact sentences (or combination of several words depending on your ngram order) that are already in it. If you want to recognize the general language you may want to use a much larger and general LM. You can then add your smaller, in domain, LM to it so that your system is better at these specific command that you want to use often. The easiest way to do this for me was to simply combine the text data for general and in domain language. Then you can just train new LM based on this combined corpus.
Probably a better way to do it would be to interpolate 2 LMs (in domain and general) with different weights.

Cheers, Besim