Speech_v1p1beta1: STT not returning the final result until another word is spoken

amirkargar commented 6 years ago

I recently added Google Speech To Text to my Nexmo python application. I'm using GRPC method to perform real time speech recognition. Below are the metadata and configs I'm using:

metadata = speech.types.RecognitionMetadata()     
metadata.interaction_type = (speech.enums.RecognitionMetadata.InteractionType.DISCUSSION)     
metadata.microphone_distance = (speech.enums.RecognitionMetadata.MicrophoneDistance.NEARFIELD)     
metadata.recording_device_type = (speech.enums.RecognitionMetadata.RecordingDeviceType.SMARTPHONE)     
metadata.original_media_type = (speech.enums.RecognitionMetadata.OriginalMediaType.AUDIO)

this_config = types.RecognitionConfig(
encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,         
sample_rate_hertz=config.GOOGLE_AUDIO_RATE,         
language_code=config.GOOGLE_LANGUAGE_CODE,        
use_enhanced=False,          
metadata=metadata,         
model='phone_call',         
enable_word_time_offsets=True,         
enable_automatic_punctuation=True         
)

streaming_config = types.StreamingRecognitionConfig(config=this_config,interim_results=True, single_utterance=False)

The issue is that when use_enhanced=True it does not return the final result of the recognition until a new word is spoken. The same code with the same configs work if I don't use the use_enhanced=True. Note that it does return the interim results but not the final result. And It is not a matter of time, no matter how long I waited it did not output the final result.

Python version: 3.6.5

google cloud speech version: 0.35.0

tseaver commented 6 years ago

For reference, the API docs for the use_enhanced flag, which is a new feature in v1p1beta1.

tseaver commented 6 years ago

/cc @beccasaurus

sduskis commented 5 years ago

This seems like a question for the speech team. You can reach out to them here: https://cloud.google.com/speech-to-text/docs/support

googleapis / google-cloud-python

Speech_v1p1beta1: STT not returning the final result until another word is spoken #5785