nrkno / transcriber-frontend

MIT License
32 stars 12 forks source link

Multi channel #54

Open schjonhaug opened 5 years ago

schjonhaug commented 5 years ago

https://cloud.google.com/speech-to-text/docs/reference/rest/v1p1beta1/RecognitionConfig

audioChannelCount

The number of channels in the input audio data. ONLY set this for MULTI-CHANNEL recognition. Valid values for LINEAR16 and FLAC are 1-8. Valid values for OGG_OPUS are '1'-'254'. Valid value for MULAW, AMR, AMR_WB and SPEEX_WITH_HEADER_BYTE is only 1. If 0 or omitted, defaults to one channel (mono). NOTE: We only recognize the first channel by default. To perform independent recognition on each channel set enableSeparateRecognitionPerChannel to 'true'.

enableSeparateRecognitionPerChannel

This needs to be set to ‘true’ explicitly and audioChannelCount > 1 to get each channel recognized separately. The recognition result will contain a channelTag field to state which channel that result belongs to. If this is not ‘true’, we will only recognize the first channel. NOTE: The request is also billed cumulatively for all channels recognized: (audioChannelCount times the audio length)