jasperproject / jasper-client

Client code for Jasper voice computing platform
MIT License
4.53k stars 1.01k forks source link

not found output as speech to text #698

Open akshat9425 opened 6 years ago

akshat9425 commented 6 years ago

i used your provided transcribe function for speech to text and i replaced your provided _decoder with other objects provided by pocketsphinx here is my code

def transcribe(fp):

config = pocketsphinx.Decoder.default_config()
config.set_string('-hmm', HMDIR)

config.set_string('-lm', LMDIR)
config.set_string('-dict', DICTD) 
decoder = Decoder(config)

speech_rec = pocketsphinx.Decoder(config)
opened_file =  open(fp)
print("\n""types results",type(opened_file),"\n\n")
#exit(0)
opened_file.seek(44)

    # FIXME: Can't use the Decoder.decode_raw() here, because
    # pocketsphinx segfaults with tempfile.SpooledTemporaryFile()
data = opened_file.read()
decoder.start_utt()
decoder.process_raw(data, False, True)
decoder.end_utt()

result = decoder.hyp()

result = speech_rec.get_hyp()

exit(0)

print("our results",result)
transcribed = [result]
logging.info('PocketSphinx ?????%r', transcribed)
return transcribed 

and i got this as output :+1:

You just said: <pocketsphinx.pocketsphinx.Hypothesis; proxy of <Swig Object of type 'Hypothesis *' at 0x7f074c6033f0> >

but expecting speech to text

someone please suggest what's wrong here

and if i use

decoder.hyp().hypstr

than nothing is printed as output

G10DRAS commented 6 years ago

where did you get this code ? what version of pocketsphinx are you using ?

akshat9425 commented 6 years ago

Thanks for reply let me explain from scratch I used https://github.com/VikParuchuri/scribe but in above repo there is a file named recognizer.py which contains function recognize() inside that there is decode_raw() which is not supported now

for its replacement i use algorithm of EXAMPLE 46 of this link https://www.programcreek.com/python/example/10479/tempfile.SpooledTemporaryFile

here is my code of that function

def transcribe(fp):

config = pocketsphinx.Decoder.default_config()
config.set_string('-hmm', HMDIR)
config.set_string('-lm', LMDIR)
config.set_string('-dict', DICTD) 
decoder = Decoder(config)

speech_rec = pocketsphinx.Decoder(config)
opened_file =  open(fp)
print("\n""types results",type(opened_file),"\n\n")

opened_file.seek(44)
data = opened_file.read()
print("value of data",data)
decoder.start_utt()
decoder.process_raw(data, False, True)
print("value of process_raw",decoder.process_raw(data, False, True))
decoder.end_utt()

result = decoder.hyp().hypstr

print("our results",result)
transcribed = [result]
logging.info('PocketSphinx ?????%r', transcribed)
return transcribed 

version of pocketsphinx is : Version: 0.1.15 version of sphinxbase is : Version: 0.8

if i speak nothing than i got 0 as output from process_raw and if i say hello i got some specific value like 256 from process_raw() but blank string from decoder.hyp().hypstr

G10DRAS commented 6 years ago

Download latest (5prealpha) Pocketsphinx and Sphinxbase code from github and try followig example https://github.com/cmusphinx/pocketsphinx/blob/master/swig/python/test/decoder_test.py

or see similar code in jasper-dev https://github.com/jasperproject/jasper-client/blob/jasper-dev/plugins/stt/pocketsphinx-stt/sphinxplugin.py

akshat9425 commented 6 years ago

i used https://github.com/cmusphinx/pocketsphinx/blob/master/swig/python/test/decoder_test.py and pass my audio file with .wav extension in place of goforward.raw file and goforward.mfc file i also ran code as it is once

in all three cases i got results as: hyp().hypstr gives blank string and model score along with confidence are giving some values

what to did to get text from hyp().hypstr of wav file

akshat9425 commented 6 years ago

i need to know while creating decoder other than .dict and .bin file you also passed a file of hmm model which one file it is i downloaded https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English/cmusphinx-en-us-8khz-5.2.tar.gz/download from here i found 6-7 files inside it

G10DRAS commented 6 years ago

Make sure wav file is in 16Khz 16 bit mono format.

Try this model

https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English/cmusphinx-en-us-ptm-5.2.tar.gz/download

akshat9425 commented 6 years ago

thanks alot man its now working i was stucked from many days because of this

akshat9425 commented 6 years ago

yeah, its working well but i need model for indian english and hindi i think that was for american english i already developed application for speech to text in american english will you please suggest me or send links to download model for indian english and hindi that would work with pocketsphinx

G10DRAS commented 6 years ago

Did you search sourceforge for it ??

akshat9425 commented 6 years ago

yeah, i found their pretrained indian english and hindi models but they are not working

akshat9425 commented 6 years ago

https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/Indian%20English/cmusphinx-en-in-5.2.tar.gz/download

https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/Indian%20English/cmusphinx-en-in-5.2.tar.gz/download

see above two

even when you sent me 6th model from below link https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English/ us english works but first one model of this link doesn't works

akshat9425 commented 6 years ago

while using hindi model i got below error AttributeError: 'NoneType' object has no attribute 'hypstr'

here is my code from os import environ, path

from pocketsphinx.pocketsphinx import from sphinxbase.sphinxbase import

MODELDIR = "/home/user/scribe/model/hindi/" DATADIR = "/home/user/scribe/pocketsphinx/test/data/"

config = Decoder.default_config() config.set_string('-hmm', path.join(MODELDIR, 'hindi_hmm')) config.set_string('-lm', path.join(MODELDIR, 'hindi.lm')) config.set_string('-dict', path.join(MODELDIR, 'hindi.dic')) config.set_string('-logfn', '/dev/null') decoder = Decoder(config)

stream = open(path.join(DATADIR, 'goforward.raw'), 'rb')

stream = open('/home/user/Downloads/Audio_Conversation-001.wav', 'rb')

in_speech_bf = False decoder.start_utt() while True: buf = stream.read(1024) if buf: decoder.process_raw(buf, False, False) if decoder.get_in_speech() != in_speech_bf: in_speech_bf = decoder.get_in_speech() if not in_speech_bf: decoder.end_utt() print 'Result:', decoder.hyp().hypstr decoder.start_utt() else: break decoder.end_utt()

G10DRAS commented 6 years ago

If not working then Train your own model.

akshat9425 commented 6 years ago

how? i am a beginner please send me any suggestion to train my own model

G10DRAS commented 6 years ago

a good start point https://cmusphinx.github.io/wiki/