Closed terenzeyuen closed 7 years ago
That is when I use reference, if I copy the en-us folder as group I get:
Reading plist: The data couldn't be read...
Anything I am doing wrongly?
Hey @terenzeyuen. Check out this unit test. It's doing the same thing you're trying to do here. the problem seems to be that you did not install TLSphinx correctly. Make sure you followed the installation steps correctly.
@maurodec Ok I delete derived data and it seems to compile without errors now. But getting nothing being printed out either – any way I can debug? I installed using Carthage and headers all set correctly.
I have tried:
Tried Live record and file but nothing is being printed...
I think it is reading something...this is what I get but it doesn't print anything out
Code:
let audioFile = (modelPath! as NSString).stringByAppendingPathComponent("test.m4a")
print(audioFile)
decoder.decodeSpeechAtPath(String(audioFile)) {
if let hyp: Hypotesis = $0 {
// Print the decoder text and score
print("Text: \(hyp.text) - Score: \(hyp.score)")
} else {
// Can't decode any speech because of an error
// print("Error:")
}
}
Any help? Log below.
/var/containers/Bundle/Application/95F5E512-7037-4C26-9716-1F78CF241F0C/SphinxDWR.app/en-us/test.m4a
INFO: cmn_prior.c(131): cmn_prior_update: from < 40.00 3.00 -1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >
INFO: cmn_prior.c(149): cmn_prior_update: to < 81.40 -24.99 -3.53 -7.49 -17.06 4.45 -5.20 5.96 5.75 -2.35 1.22 -3.00 -1.89 >
INFO: ngram_search_fwdtree.c(1553): 460 words recognized (9/fr)
INFO: ngram_search_fwdtree.c(1555): 179701 senones evaluated (3328/fr)
INFO: ngram_search_fwdtree.c(1559): 341096 channels searched (6316/fr), 28650 1st, 20749 last
INFO: ngram_search_fwdtree.c(1562): 1992 words for which last channels evaluated (36/fr)
INFO: ngram_search_fwdtree.c(1564): 22861 candidate words for entering last phone (423/fr)
INFO: ngram_search_fwdtree.c(1567): fwdtree 3.48 CPU 6.437 xRT
INFO: ngram_search_fwdtree.c(1570): fwdtree 3.71 wall 6.863 xRT
INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 7 words
INFO: ngram_search_fwdflat.c(945): 213 words recognized (4/fr)
INFO: ngram_search_fwdflat.c(947): 9036 senones evaluated (167/fr)
INFO: ngram_search_fwdflat.c(949): 5034 channels searched (93/fr)
INFO: ngram_search_fwdflat.c(951): 489 words searched (9/fr)
INFO: ngram_search_fwdflat.c(954): 232 word transitions (4/fr)
INFO: ngram_search_fwdflat.c(957): fwdflat 0.01 CPU 0.016 xRT
INFO: ngram_search_fwdflat.c(960): fwdflat 0.01 wall 0.021 xRT
INFO: ngram_search.c(1199): not found in last frame, using .52 instead
INFO: ngram_search.c(1252): lattice start node .0 end node .0
INFO: ngram_search.c(1278): Eliminated 97 nodes before end node
INFO: ngram_search.c(1383): Lattice has 98 nodes, 0 links
INFO: ps_lattice.c(1380): Bestpath score: -2147483648
INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(:0:52) = -536899456
INFO: ngram_search_fwdtree.c(432): TOTAL fwdtree 3.48 CPU 6.558 xRT
INFO: ngram_search_fwdtree.c(435): TOTAL fwdtree 3.71 wall 6.992 xRT
INFO: ngram_search_fwdflat.c(176): TOTAL fwdflat 0.01 CPU 0.016 xRT
INFO: ngram_search_fwdflat.c(179): TOTAL fwdflat 0.01 wall 0.021 xRT
INFO: ngram_search.c(303): TOTAL bestpath 0.00 CPU 0.000 xRT
INFO: ngram_search.c(306): TOTAL bestpath 0.00 wall 0.000 xRT
I did test with the Pocketsphinx audio files and work as expected. I don't think it will work with any audio format different than raw PCM.
Keep in mind that If try to detect arbitrary speech you will get bad results. Sphinx, and TLSphinx, are meant to be use for command like interaction.
Having say that, you should get something when running the live decoding with your mic. Do you try with something like this ?
@BrunoBerisso yes I've tried changing my code to LiveDecoding, given access to Microphone and double checked Settings but nothing gets printed....
My code (including viewdidload):
override func viewDidLoad() {
super.viewDidLoad()
// Do any additional setup after loading the view, typically from a nib.
let modelPath = NSBundle.mainBundle().pathForResource("en-us", ofType: nil)
let hmm = (modelPath! as NSString).stringByAppendingPathComponent("en-us")
let lm = (modelPath! as NSString).stringByAppendingPathComponent("en-us.lm.dmp")
let dict = (modelPath! as NSString).stringByAppendingPathComponent("cmudict-en-us.dict")
if let config = Config(args: ("-hmm", hmm), ("-lm", lm), ("-dict", dict)) {
if let decoder = Decoder(config:config) {
decoder.startDecodingSpeech { (hyp) -> () in
print("Utterance: \(hyp)")
}
} else {
// Handle Decoder() fail
}
} else {
// Handle Config() fail
}
}
You need to store the decoder in some persistent location because it's get release at the end of viewDidLoad
and that cancel the operation.
Try create a property in your controller for the decoder. Also we have reports that the live decode process is not working properly, this pull request is meant to improve it.
@BrunoBerisso thanks – I tried with an example wav file and it works. So must be my m4a file. :) http://www.voiptroubleshooter.com/open_speech/american.html
I see, let me try live decoding in other location.
@BrunoBerisso Hmm no luck on live decoding. Nothing gets printed. Anything I am doing wrongly?
Triggering from a button action instead.
ViewDidLoad:
if let modelPath = getModelPath() {
let hmm = (modelPath as NSString).stringByAppendingPathComponent("en-us")
let lm = (modelPath as NSString).stringByAppendingPathComponent("en-us.lm.dmp")
let dict = (modelPath as NSString).stringByAppendingPathComponent("cmudict-en-us.dict")
if let config = Config(args: ("-hmm", hmm), ("-lm", lm), ("-dict", dict)) {
decoder = Decoder(config:config)
}
}
IBAction:
decoder.startDecodingSpeech { (hyp) -> () in
print("Utterance: \(hyp)")
}
Hi.
I just update the implementation for startDecodingSpeech
and it's working in OSX. This implementation use AVAudioEngine
like the PR I mention before. Try checkout the branch _audioenginestreaming with carthage and give ti a try. The line in your Cartflie should be like:
github "Tryolabs/TLSphinx" == "audioengine_streaming"
Another reason for your issue is that the utterances are reported when they are detected. This means that you have to test in a silent environment so Sphinx can detect when an utterance start and end. If Sphinx don't detect any silence it never report anything.
Hope this new implementation work better than the old one. Let us know.
Hey, just merge this to main
so just run an update to get the last version. I will close this issue now, If you find another problem let me know.
Can't seem to read local file..