tryolabs / TLSphinx

Swift wrapper around Pocketsphinx
MIT License
155 stars 58 forks source link

Reading local file #18

Closed terenzeyuen closed 7 years ago

terenzeyuen commented 7 years ago

Can't seem to read local file..

let audioFile = NSBundle.mainBundle().URLForResource("test", withExtension: "m4a")

                    decoder.decodeSpeechAtPath(String(audioFile)) {

                        if let hyp: Hypotesis = $0 {
                            // Print the decoder text and score
                            print("Text: \(hyp.text) - Score: \(hyp.score)")
                        } else {
                            // Can't decode any speech because of an error
                            print("Error:")
                        }
                }
terenzeyuen commented 7 years ago

That is when I use reference, if I copy the en-us folder as group I get:

Reading plist: The data couldn't be read...

screen shot 2016-07-14 at 19 04 07

Anything I am doing wrongly?

maurodec commented 7 years ago

Hey @terenzeyuen. Check out this unit test. It's doing the same thing you're trying to do here. the problem seems to be that you did not install TLSphinx correctly. Make sure you followed the installation steps correctly.

terenzeyuen commented 7 years ago

@maurodec Ok I delete derived data and it seems to compile without errors now. But getting nothing being printed out either – any way I can debug? I installed using Carthage and headers all set correctly.

I have tried:

Tried Live record and file but nothing is being printed...

terenzeyuen commented 7 years ago

I think it is reading something...this is what I get but it doesn't print anything out

Code:

let audioFile  = (modelPath! as NSString).stringByAppendingPathComponent("test.m4a")
                    print(audioFile)

                    decoder.decodeSpeechAtPath(String(audioFile)) {

                        if let hyp: Hypotesis = $0 {
                            // Print the decoder text and score
                            print("Text: \(hyp.text) - Score: \(hyp.score)")
                        } else {
                            // Can't decode any speech because of an error
                          //  print("Error:")
                        }
                }

Any help? Log below.

/var/containers/Bundle/Application/95F5E512-7037-4C26-9716-1F78CF241F0C/SphinxDWR.app/en-us/test.m4a INFO: cmn_prior.c(131): cmn_prior_update: from < 40.00 3.00 -1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > INFO: cmn_prior.c(149): cmn_prior_update: to < 81.40 -24.99 -3.53 -7.49 -17.06 4.45 -5.20 5.96 5.75 -2.35 1.22 -3.00 -1.89 > INFO: ngram_search_fwdtree.c(1553): 460 words recognized (9/fr) INFO: ngram_search_fwdtree.c(1555): 179701 senones evaluated (3328/fr) INFO: ngram_search_fwdtree.c(1559): 341096 channels searched (6316/fr), 28650 1st, 20749 last INFO: ngram_search_fwdtree.c(1562): 1992 words for which last channels evaluated (36/fr) INFO: ngram_search_fwdtree.c(1564): 22861 candidate words for entering last phone (423/fr) INFO: ngram_search_fwdtree.c(1567): fwdtree 3.48 CPU 6.437 xRT INFO: ngram_search_fwdtree.c(1570): fwdtree 3.71 wall 6.863 xRT INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 7 words INFO: ngram_search_fwdflat.c(945): 213 words recognized (4/fr) INFO: ngram_search_fwdflat.c(947): 9036 senones evaluated (167/fr) INFO: ngram_search_fwdflat.c(949): 5034 channels searched (93/fr) INFO: ngram_search_fwdflat.c(951): 489 words searched (9/fr) INFO: ngram_search_fwdflat.c(954): 232 word transitions (4/fr) INFO: ngram_search_fwdflat.c(957): fwdflat 0.01 CPU 0.016 xRT INFO: ngram_search_fwdflat.c(960): fwdflat 0.01 wall 0.021 xRT INFO: ngram_search.c(1199): not found in last frame, using .52 instead INFO: ngram_search.c(1252): lattice start node .0 end node .0 INFO: ngram_search.c(1278): Eliminated 97 nodes before end node INFO: ngram_search.c(1383): Lattice has 98 nodes, 0 links INFO: ps_lattice.c(1380): Bestpath score: -2147483648 INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(:0:52) = -536899456 INFO: ngram_search_fwdtree.c(432): TOTAL fwdtree 3.48 CPU 6.558 xRT INFO: ngram_search_fwdtree.c(435): TOTAL fwdtree 3.71 wall 6.992 xRT INFO: ngram_search_fwdflat.c(176): TOTAL fwdflat 0.01 CPU 0.016 xRT INFO: ngram_search_fwdflat.c(179): TOTAL fwdflat 0.01 wall 0.021 xRT INFO: ngram_search.c(303): TOTAL bestpath 0.00 CPU 0.000 xRT INFO: ngram_search.c(306): TOTAL bestpath 0.00 wall 0.000 xRT

BrunoBerisso commented 7 years ago

I did test with the Pocketsphinx audio files and work as expected. I don't think it will work with any audio format different than raw PCM.

Keep in mind that If try to detect arbitrary speech you will get bad results. Sphinx, and TLSphinx, are meant to be use for command like interaction.

Having say that, you should get something when running the live decoding with your mic. Do you try with something like this ?

terenzeyuen commented 7 years ago

@BrunoBerisso yes I've tried changing my code to LiveDecoding, given access to Microphone and double checked Settings but nothing gets printed....

My code (including viewdidload):


override func viewDidLoad() {
        super.viewDidLoad()
        // Do any additional setup after loading the view, typically from a nib.
        let modelPath = NSBundle.mainBundle().pathForResource("en-us", ofType: nil)
        let hmm = (modelPath! as NSString).stringByAppendingPathComponent("en-us")
        let lm = (modelPath! as NSString).stringByAppendingPathComponent("en-us.lm.dmp")
        let dict = (modelPath! as NSString).stringByAppendingPathComponent("cmudict-en-us.dict")

        if let config = Config(args: ("-hmm", hmm), ("-lm", lm), ("-dict", dict)) {
            if let decoder = Decoder(config:config) {
                decoder.startDecodingSpeech { (hyp) -> () in
                    print("Utterance: \(hyp)")
                }
            } else {
                // Handle Decoder() fail
            }
        } else {
            // Handle Config() fail  
        }

    }
BrunoBerisso commented 7 years ago

You need to store the decoder in some persistent location because it's get release at the end of viewDidLoad and that cancel the operation.

Try create a property in your controller for the decoder. Also we have reports that the live decode process is not working properly, this pull request is meant to improve it.

terenzeyuen commented 7 years ago

@BrunoBerisso thanks – I tried with an example wav file and it works. So must be my m4a file. :) http://www.voiptroubleshooter.com/open_speech/american.html

screen shot 2016-07-14 at 21 04 44

I see, let me try live decoding in other location.

terenzeyuen commented 7 years ago

@BrunoBerisso Hmm no luck on live decoding. Nothing gets printed. Anything I am doing wrongly?

Triggering from a button action instead.

ViewDidLoad:

if let modelPath = getModelPath() {
            let hmm = (modelPath as NSString).stringByAppendingPathComponent("en-us")
            let lm = (modelPath as NSString).stringByAppendingPathComponent("en-us.lm.dmp")
            let dict = (modelPath as NSString).stringByAppendingPathComponent("cmudict-en-us.dict")

            if let config = Config(args: ("-hmm", hmm), ("-lm", lm), ("-dict", dict)) {
                decoder = Decoder(config:config)
            }
        }

IBAction:

decoder.startDecodingSpeech { (hyp) -> () in
            print("Utterance: \(hyp)")
        }
BrunoBerisso commented 7 years ago

Hi.

I just update the implementation for startDecodingSpeech and it's working in OSX. This implementation use AVAudioEngine like the PR I mention before. Try checkout the branch _audioenginestreaming with carthage and give ti a try. The line in your Cartflie should be like:

github "Tryolabs/TLSphinx" == "audioengine_streaming"

Another reason for your issue is that the utterances are reported when they are detected. This means that you have to test in a silent environment so Sphinx can detect when an utterance start and end. If Sphinx don't detect any silence it never report anything.

Hope this new implementation work better than the old one. Let us know.

BrunoBerisso commented 7 years ago

Hey, just merge this to main so just run an update to get the last version. I will close this issue now, If you find another problem let me know.