Raspberry - STT, Assistant, TTS response time

pnkram commented 6 years ago

Hi,

I made an assistant that uses Watson's STT, Assistant and TTS.

In my code I used pyaudio (like in the example provided in microphone-speech-to-text.py) to record 2-3 seconds of audio, then I pass that audio to Watson STT using speech_to_text.recognize which returns a text that I pass to Watson Assistant. Then I pass the result to Waton TTS (always using the code provided in the example) which then give me a .wav that I play using vlc.MediaPlayer.

I did some testing on raspberry and the problem is that the whole response time is too long. I have different intents on my assistant and I tested them all putting on an excel file the response time of each service. These are the average responses time: STT 3,48 seconds, Assistant 0,62 seconds, TTS 2 seconds. On average the response in audio is played after 6 seconds which is a very long time. My question is: Is it somehow possible to reduce the response time? STT is the service that takes too much to return a result and in some cases it takes 5 seconds. Here is the excel file of response time tempi.xlsx

STT

transcript = ''
    #mettere il model
    try:
        with open(recording_name,'rb') as f:
            dictResponse = speech_to_text.recognize(audio=f, content_type='audio/l16; rate=44100')
            #print(json.dumps(dictResponse,indent=2))
            if dictResponse["results"]:
                transcript = dictResponse["results"][0]["alternatives"][0]["transcript"]
                print('transcript: {}'.format(transcript))
            else:
                print("no results from stt")
                transcript = ""
    except Exception as e:
        print('speech2text error: {}'.format(e))

    doneSTT = time.time()
    print("--> STT time: {}".format(doneSTT-doneRecording))

Assistant

#input to watson assistant
    response=''
    try:
        response = assistant.message(
        workspace_id=workspace_id,
        input={
            'text': transcript
        },
        context={
            'metadata': {
                'deployment': 'myDeployment'
            }
        })
    except Exception as e:
        print("Assistant error: {}".format(e))

    #watson response
    #print(json.dumps(response,indent=2))
    if response["intents"]:
        intents = response["intents"][0]
        intent = intents["intent"]
        intentConfidence = response["intents"][0]["confidence"]
        print('intent name: {}, confidence: {}'.format(intent, intentConfidence))
    else:
        print("intent not found")

        if response["output"]:
            risposta = response["output"]["text"][0]
            print('risposta: {}'.format(risposta))
        else:
            print("no output from assistant")

TTS and response playback

try:
            with open(outputName,'wb') as audio_file:
                audio_file.write(text_to_speech.synthesize(risposta, accept='audio/wav',voice="en-US_AllisonVoice").content)
            #audio_file.write(text_to_speech.synthesize(risposta, accept='audio/wav',voice="it-IT_FrancescaVoice").content)
        except Exception as e:
            print("Text2speech error: {}".format(e))

        doneTTS = time.time()
        print("--> TTS time: {}".format(doneTTS-doneAssistant))

        #play response
        player = vlc.MediaPlayer("/home/pi/Downloads/stt-assistant-rasp-backup/response.wav")
        player.play()

        endResponse = time.time()

        print("--> time elapsed: {}".format(endResponse-doneRecording))

Any help would be appreciated, thanks.

python sdk version

1.5

python version

default python3 on raspberry

ehdsouza commented 6 years ago

Hi @pnkram, We have seen STT intermittently slowing down. Did you check the response time using curl?

pnkram commented 6 years ago

Hi @ehdsouza,

I tried curl using this code inside python:

stringJson = subprocess.check_output(['curl', '-X', 'POST', '-u','myUsername:myPass', '--header', 'Content-Type: audio/wav, Model:en-US_BroadbandModel', '--data-binary', '@/home/pi/Downloads/Watson/stt-assistant-rasp-backup/'+WAVE_INPUT_FILENAME, 'https://stream.watsonplatform.net/speech-to-text/api/v1/recognize'])
res = stringJson.decode()
response = json.loads(res)
transcript = response["results"][0]["alternatives"][0]["transcript"]
print("transcript: {}".format(transcript))

Then I checked again STT response time with speech_to_text.recognize and with CURL. these are the average response time: 3,41s and 3,51s respectively time.xlsx. So the problem should be in STT service? (my upload speed is 33.19 Mbps)

I would like to create Watson Assistant with those 3 services but the whole response time is unacceptable, about 6 seconds before I get the response's playback. My question is: will the service be improved in terms of response time in the near future?

Thank you

ehdsouza commented 6 years ago

Hi @pnkram,

Could you also run the same in your dev environment and check the response times as it could be because of the slow processing time on raspberry pi. I ran it on my macbook and the response times were very short.

pnkram commented 6 years ago

Hmm, I just tried on my laptop with an i5-8250U but the average response time of STT is basically the same (about 3 seconds). Anyway I would need to run it on a raspberry.

ehdsouza commented 6 years ago

@pnkram, The raspberry response times came as 341s vs your laptop 3s, right? If you think it is more of a service issue rather than an environment, feel free to open a ticket: https://ibm.biz/ibmcloudsupport

ehdsouza commented 6 years ago

Closing this as it is not a SDK issue. Feel free to reopen

watson-developer-cloud / python-sdk

Raspberry - STT, Assistant, TTS response time #519

python sdk version

python version