Surprising errors in production code using v2 of @google-cloud/speech

sorokinvj commented 1 month ago

Hey guys, our system today started to produce surprisingly many errors, our PROD server is affected and all our users.

Error on "error" in recognizeStream {"code":3,"details":"Audio data does not appear to be in a supported encoding. If you believe this to be incorrect, try explicitly specifying the decoding parameters.","metadata":{}}

We did not change any implementation and I hope that recent update of Google Chrome also did not touch audio interfaces. We are using MediaRecorder API and up until today all users were happy and get their streams recognized successfully.

Here is our main service:

type StreamingRecognitionConfig =
  protos.google.cloud.speech.v2.IStreamingRecognitionConfig;

export const createGoogleService = ({
  language,
  send,
}: {
  language: string;
  send: Sender<MachineEvent>;
}): Promise<TranscriptionService> => {
  return new Promise((resolve, reject) => {
    try {
      const client = new speech.SpeechClient({
        keyFilename: 'assistant-demo.json',
      });

      const recognizer = findRecognizerByLanguageCode(language).name;

      const streamingConfig: StreamingRecognitionConfig = {
        config: {
          autoDecodingConfig: {},
        },
        streamingFeatures: {
          interimResults: false,
          enableVoiceActivityEvents: true, // Add this line to enable voice activity events
          voiceActivityTimeout: {
            speechStartTimeout: { seconds: 60 },
            speechEndTimeout: { seconds: 60 },
          },
        },
      };
      const configRequest = {
        recognizer,
        streamingConfig,
      };

      logger.info('Creating Google service with recogniser:', recognizer);

      const recognizeStream = client
        ._streamingRecognize()
        .on('error', error => {
          logger.error(
            'Error on "error" in recognizeStream',
            JSON.stringify(error)
          );
          send({ type: 'ERROR', data: parseErrorMessage(error) });
        })
        .on('data', (data: StreamingRecognizeResponse) => {
          if (data.results.length > 0) {
            const transcription = transformGoogleResponse(data);
            if (transcription) {
              const transcriptionText = getText(transcription);
              if (!transcriptionText?.length) {
                // if the transcription is empty, do nothing
                return;
              }
              send({ type: 'NEW_TRANSCRIPTION', data: transcriptionText });
            }
          }
        })
        .on('end', () => {
          logger.warn('Google recognizeStream ended');
        });

      let configSent = false;
      let headersSent = false;
      const transcribeAudio = (audio: Buffer, headers: Buffer) => {
        if (!configSent) {
          recognizeStream.write(configRequest);
          configSent = true;
          return;
        }
        if (configSent && !headersSent) {
          recognizeStream.write({ audio: headers });
          headersSent = true;
          return;
        }
        recognizeStream.write({ audio });
      };

      const stop = () => {
        if (recognizeStream) {
          recognizeStream.end();
        }
      };
      resolve({ stop, transcribeAudio });
    } catch (error) {
      logger.error('Error creating Google service:', error);
      reject(error);
    }
  });
};

danielbankhead commented 1 month ago

Hey @sorokinvj, which file types are affected?

sorokinvj commented 1 month ago

Hey @sorokinvj, which file types are affected?

Hey @danielbankhead, we are using real-time transcriptions. Surprisingly until yesterday we were able to use real-time with v2 and with WEBM_OPUS encoding, although I see now that in v2 there is no such thing! only

AUDIO_ENCODING_UNSPECIFIED = 0,
LINEAR16 = 1,
MULAW = 2,
ALAW = 3

Though our setup involved autoDecodingConfig: {}. Do you guys support 'audio/webm;codecs=opus' in v2?

Currently we rolled back to v1 with this code and everything went back to normal:

export const createGoogleService = ({
  language,
  send,
}: {
  language: string;
  send: Sender<MachineEvent>;
}): Promise<TranscriptionService> => {
  return new Promise((resolve, reject) => {
    try {
      const client = new speech.SpeechClient({
        keyFilename: 'assistant-demo.json',
      });

      const recognizeStream = client
        .streamingRecognize({
          config: {
            encoding: 'WEBM_OPUS',
            sampleRateHertz: 48000,
            languageCode: language,
            enableAutomaticPunctuation: true,
            enableSpokenPunctuation: {
              value: true,
            },
          },
          interimResults: false,
          enableVoiceActivityEvents: true,
        })
        .on('error', error => {
          logger.error('Error on "error" in recognizeStream', error);
          send({ type: 'ERROR', data: parseErrorMessage(error) });
          reject(error);
        })
        .on('data', (data: StreamingRecognizeResponse) => {
          if (data.results.length > 0) {
            const transcription = transformGoogleResponse(data);
            if (transcription) {
              const transcriptionText = getText(transcription);
              if (!transcriptionText?.length) {
                // if the transcription is empty, do nothing
                return;
              }
              send({ type: 'NEW_TRANSCRIPTION', data: transcriptionText });
            }
          }
        })
        .on('end', () => {
          send({
            type: 'TRANSCRIPTION_SERVICE_CLOSED',
            data: 'TRANSCRIPTION_SERVICE_CLOSED',
          });
        });

      let headersSent = false;

      const transcribeAudio = (audio: Buffer, headers: Buffer) => {
        if (!headersSent) {
          recognizeStream.write(headers);
          headersSent = true;
          return;
        }
        recognizeStream.write(audio);
      };

      const stop = () => {
        if (recognizeStream) {
          recognizeStream.end();
        }
      };

      resolve({ stop, transcribeAudio });
    } catch (error) {
      logger.error('Error creating Google service:', error);
      reject(error);
    }
  });
};

on the frontend we are using basic new MediaRecorder api to send the data:

    navigator.mediaDevices
      .getUserMedia(constraints)
      .then((media) => {
        // Continue to play the captured audio to the user.
        const output = new AudioContext();
        const source = output.createMediaStreamSource(media);
        source.connect(output.destination);

        const audioStream = new MediaStream(media.getAudioTracks());
        const silenceDetector = new SilenceDetector(audioStream);
        const mediaRecorder = new MediaRecorder(audioStream, {
          mimeType: MIME_TYPE,
        });

        let audioHeaders: BlobEvent;
        mediaRecorder.ondataavailable = (event: BlobEvent) => {
          if (!audioHeaders) {
            audioHeaders = event;
          }

          const isSilent = silenceDetector?.getIsSilent();
          if (!isSilent) {
            if (!audioHeaders) {
              logger.error('No audio headers found');
              return;
            }
            sendAudioChunk(event, audioHeaders);
          }
        };

        mediaRecorder.start(TIMESLICE_INTERVAL);

meitarbe commented 1 month ago

@danielbankhead We have the same use case and issue, it is very difficult for us to move to v1. Any update about it? It is considered critical to our system.

danielbankhead commented 1 month ago

WEBM OPUS should be supported:

I will see what’s going on.

meitarbe commented 1 month ago

WEBM OPUS should be supported:

https://cloud.google.com/speech-to-text/docs/encoding

https://github.com/googleapis/google-cloud-node/blob/34e36a6c72a21ef8bb383233d15e2b82dfada8da/packages/google-cloud-speech/protos/google/cloud/speech/v2/cloud_speech.proto#L708

I will see what’s going on.

@danielbankhead
Thanks for the quick reply! If you need more info, it seems like it was started around Aug 6 (we started to see tones of these errors in our logs on GCP). I also tested webm files that I am 100% sure worked before (we save a pair of the audio and produced text), and they do not work now when nothing is changed from our side.

paullombardcartello commented 1 month ago

Also experiencing this issue, also with WebM and seems to have broken a few days ago.

asafda commented 4 weeks ago

Also experiencing this issue

danielbankhead commented 4 weeks ago

Update: the service team is aware of this issue; I should have another update soon.

paullombardcartello commented 3 weeks ago

Any updates?

danielbankhead commented 3 weeks ago

A fix is rolling out and should be available shortly

sorokinvj commented 2 weeks ago

@danielbankhead any news on the fix? is it available all ready? do you know the release version I should be looking for?

danielbankhead commented 2 weeks ago

The issue is on the service side; no update required on the client side. The rollback should be rolled out by now, however I'm waiting for the service team to confirm.

danielbankhead commented 1 week ago

The fix should be widely available now.

felabrecque commented 1 week ago

I can confirm that the problem is fixed. I sent an audio/webm file to the google-cloud-speech v2 recognize functionality and it worked (didn't told me that file format was invalid)

googleapis / google-cloud-node

Surprising errors in production code using v2 of @google-cloud/speech #5609