Closed murtuzamdahod closed 4 years ago
DeepSpeech appears to use it as a feature in a beam search decoder:
And I think there's some documentation here: https://github.com/mozilla/DeepSpeech/blob/becc3d9745b6b3b21bb2843922b4f0c0252ed7df/doc/Decoder.rst .
But isn't this really more of a question for deepspeech?
As i said, My end goal is to build a language model like kenlm but not using it with deepspeech. I want to use it to correct my own vocabularies.
I don't see any documentation to implement this
When you say "build a language model like kenlm" does that mean:
You want to write your own software to implement a language model, as an alternative to kenlm, in which case go read https://dash.harvard.edu/bitstream/handle/1/25104739/tr-10-98.pdf?sequence=1 or any of the more neural work these days
You want to create a file with probabilities using kenlm, in which case go to https://neural.mt/code/kenlm/estimation/
I want to use kenlm on my dataset. I have 20lakh food item names and i want to train it using kenlm so that i can get correct names in the output.
For eg:
IN : "Cheeessseee Pijja"
OUT: "Cheese Pizza"
I believe even in deepspeech it works in a similar manner. I have the dataset with correct vocabularies shown in the output.
Will the https://neural.mt/code/kenlm/estimation/ work for this case??
You are of course welcome to build a language model on your data using kenlm. I provide the probabilities of strings. It's up to you to find or write a tool to do the task you want, possibly using these probabilities. And it might use beam search. Have fun.
When you say "build a language model like kenlm" does that mean:
- You want to write your own software to implement a language model, as an alternative to kenlm, in which case go read https://dash.harvard.edu/bitstream/handle/1/25104739/tr-10-98.pdf?sequence=1 or any of the more neural work these days
- You want to create a file with probabilities using kenlm, in which case go to https://neural.mt/code/kenlm/estimation/
What does model.score() gives? How do I find similar words using this scores?
Hello @kpu, Thanks for the previous help. I was able to understand the concept of using a language model with a CTC beam search decoder. Are you aware of any good implementations where I can use kenlm language model with a CTC beam search in Python? Because I am not working much with C or Java.
I have read many theoretical blogs which explains how language model is used but i see very few in implementations other than just the text generation.
Are there any resources or snippets where i can actually look at the piece of code to understand how kenlm works with deepspeech??
My end goal is to build a language model like kenlm but not using it with deepspeech. I want to use it to correct my own vocabularies.