vivekuppal / transcribe

Transcribe is a real time transcription, conversation, Language learning platform. It provides live transcripts from microphone and speaker. It generates a suggested conversation response using OpenAI's GPT API. It will read out the responses, simulating a real live conversation in English or another language.
https://abhinavuppal1.github.io/
MIT License
194 stars 46 forks source link

How to change the language recognition of Deepgram API? I want him to recognize it as Chinese instead of English. I tried to modify the language in DeepgramSTTModel in the transfer_models.py file, but still can only recognize English #189

Closed willt0 closed 7 months ago

willt0 commented 7 months ago
 def __init__(self, stt_model_config: dict):
        # Check for api_key
        if stt_model_config["api_key"] is None:
            raise Exception("Attempt to create Deepgram STT Model without an api key.")  # pylint: disable=W0719
        # self.lang = 'en-US'
        self.lang = 'zh-CN'

        print('[INFO] Using Deepgram API for transcription.')
        self.audio_model = DeepgramClient(stt_model_config["api_key"])
abhinavuppal1 commented 7 months ago

The configuration is not clear from the issue description. Are you using command line parameters or override.yaml to use deepgram.

The observation is correct that deepgram is unable to recognize any other languages besides english.

I believe the following change will resolve the issue

Add the line detect_language=True

here https://github.com/vivekuppal/transcribe/blob/f25f0874eaa298079e2bff0fd2e58ddec389cc08/sdk/transcriber_models.py#L311

The method will look like this with the additional option of detecting the language.

    def get_transcription(self, wav_file_path: str):
        """Get text using STT
        """
        try:
            with open(wav_file_path, "rb") as audio_file:
                buffer_data = audio_file.read()

            payload: FileSource = {
                "buffer": buffer_data
                }

            options = PrerecordedOptions(
                model="nova",
                smart_format=True,
                utterances=True,
                punctuate=True,
                paragraphs=True,
                detect_language=True)

            response = self.audio_model.listen.prerecorded.v("1").transcribe_file(payload, options)
            # This is not necessary and just a debugging aid
            with open('logs/deep.json', mode='a', encoding='utf-8') as deep_log:
                deep_log.write(response.to_json(indent=4))

            return response
        except Exception as exception:
            print(exception)

        return None

This should resolve the issue.

willt0 commented 7 months ago

Thank you!!!The problem has been resolved.

abhinavuppal1 commented 7 months ago

Resolved in #190.