Open axchanda opened 5 years ago
Any updates on this, please?
Currently language models can only really be built through through lmplz
or through the C++ API that program calls internally.
I don't think you want a Kneser-Ney language model here. You want a feature function that happens to be a language model to boost the score of specific words. You need to think about what score those words should have (i.e. the "boost") and create your own feature function. Kneser-Ney is not a magic black box that will solve your problem.
Hi @kpu, I have this exact same question. I'm wondering if you have a more concrete idea on how this could be implemented. It would be a great feature for kenlm + DeepSpeech if we could pass in a list of words in addition to the audio, and the LM would be biased towards words during decoding
@JRMeyer Is this increasing the probability of a unigram? You want a wrapper around the LM that adds a constant for elements of a set?
As an MT person if it wasn't OOV, I'd call this run-time domain adaptation whereby you retrieve relevant sentences then upweight them in training
@JRMeyer This is a typical speech recognition feature. If I understand you correctly, basically you want to up-weight or down-weight a list of "phrases", which may be a brand name, a particular street name in an address, a terminology of a particular domain etc. It serves like a "hot-fix" mechanism, when you don't have "sentence level" data and you don't want to retrain the whole LM. As a "hot-fix", you need to manually tune the imposed up-weight or de-weight to get your desired results.
Here is a paper from google that deals the very same problem you have: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43819.pdf it's done in ngram lm framework, the enhancement is not in ngram itself(As Ken said), it's in searching(decoding) process.
besides that, you can implement a non-ngram way, you turn the problem into a biasing graph, say: build the phrase list into a compact prefix tree:
during decoding, attach a prefix-tree state to search hypothesis. Each time the hypothesis proceed a word, proceed its state in prefix-tree accordingly:
I don't know DeepSpeech well, if anyone plans on such feature, I would like to help.
@kpu -- this can definitely be seen as a run-time domain adaptation, where the target-domain data is limited to small set of key-words, not sentences. It would be more than just increasing unigrams, it would be increasing probs of all ngrams in which the key-words were observed in the LM.
@dophist -- that paper sounds exactly what I'm thinking of... so in this specific case, would it involve changes to kenlm or changes to the deepspeech decoder (or something else?). I think this would be a welcome PR to deepspeech, right @reuben ?
We already depend on OpenFST and use it as a trie structure for restricting the decoder vocabulary, so you could use it for the additional contextual biasing prefix tree and advance both structures simultaneously during decoding.
@reuben, I'm not sure what you mean by "prefix" tree. Are you suggesting making two trie structures (one only containing key words and one containing the original LM), and decoding with both simultaneously? is there any more detail on this approach you could point to (e.g. a paper or implementation)?
@JRMeyer I'm talking about the second implementation suggested by @dophist in the comment above. A prefix tree is the same as a trie. I'm suggesting you could extend the DeepSpeech decoder to load or build the contextual biasing trie, then use it as described above. The tooling (for handling states, arcs, weights) is already integrated in the form of the OpenFST library, which should help.
You can certainly run two language models as features. Or we can make a wrapper that looks like kenlm but does some runtime changes to the probabilities. Let's figure out what you want in terms of quality first then talk datastructures.
Hi @kpu ,
I would like to ask about another question on how to acheive LM modelling on the fly. Like I had tried Google Speech to text, where in you pass certain key phrases during the call as parameters and Google gives priorities to those words (OOV / priority words). I do see we need to have the priority words before hand for training the LM models.But is it possible by some means, to pass the prioirity words at the time of inference and costruct the LM models instantly?? This is in accordance with Deepspeech.
I know it maybe beyond the scope of this repo, but just to have your suggestions to it. It would be a great value add, if you let me know on it. Thanks a lot!