ruizguille / voice-assistant

AI Voice Assistant built with Groq, Llama 3 and Deepgram.
https://voice-assistant.codeawake.com
MIT License
5 stars 2 forks source link

Not receiving response from deepgram for transcription #1

Open devsalman247 opened 3 weeks ago

devsalman247 commented 3 weeks ago

I am not receiving any result from deepgram when running app... I had tested local python script first which is still working perfectly fine but when audio is streamed through frontend using websocket I do not receive any output in console:

async def transcribe_audio(self):
        async def on_message(self_handler, result, **kwargs):
            print("Transcript: ", result.channel.alternatives[0].transcript)
            sentence = result.channel.alternatives[0].transcript
            if len(sentence) == 0:
                return
            if result.is_final:
                self.transcript_parts.append(sentence)
                await self.transcript_queue.put({'type': 'transcript_final', 'content': sentence})
                if result.speech_final:
                    full_transcript = ' '.join(self.transcript_parts)
                    self.transcript_parts = []
                    await self.transcript_queue.put({'type': 'speech_final', 'content': full_transcript})
            else:
                await self.transcript_queue.put({'type': 'transcript_interim', 'content': sentence})

        async def on_utterance_end(self_handler, utterance_end, **kwargs):
            if len(self.transcript_parts) > 0:
                full_transcript = ' '.join(self.transcript_parts)
                self.transcript_parts = []
                await self.transcript_queue.put({'type': 'speech_final', 'content': full_transcript})

        dg_connection = deepgram.listen.asynclive.v('1')
        dg_connection.on(LiveTranscriptionEvents.Transcript, on_message)
        dg_connection.on(LiveTranscriptionEvents.UtteranceEnd, on_utterance_end)
        if await dg_connection.start(dg_connection_options) is False:
            raise Exception('Failed to connect to Deepgram')

        try:
            while not self.finish_event.is_set():
                # Receive audio stream from the client and send it to Deepgram to transcribe it
                data = await self.websocket.receive_bytes()
                await dg_connection.send(data)
        finally:
            await dg_connection.finish()

When I tried to change deepgram config by adding sample_rate & encoding params I receive output from deepgram but the result is empty string:

deepgram_config = DeepgramClientOptions(options={'keepalive': 'true'})
deepgram = DeepgramClient(settings.DEEPGRAM_API_KEY, config=deepgram_config)
dg_connection_options = LiveOptions(
    model='nova-2',
    language='en',
    encoding='linear16',
    channels=1,
    sample_rate=16000,
    # Apply smart formatting to the output
    smart_format=True,
    # To get UtteranceEnd, the following must be set:
    interim_results=True,
    utterance_end_ms='1000',
    vad_events=True,
    # Time in milliseconds of silence to wait for before finalizing speech
    endpointing=500,
)

OS: Windows 11 IDE: VS Code Poetry Version: Poetry (version 1.8.3) Poetry config:

[tool.poetry]
name = "voice-assistant"
version = "0.1.0"
description = ""
authors = ["ruizguille <guillermo@codeawake.com>"]
packages = [{include = "app"}]

[tool.poetry.dependencies]
python = "^3.11"
python-dotenv = "^1.0.1"
groq = "^0.7.0"
deepgram-sdk = "^3.2.7"
requests = "^2.32.2"
fastapi = "^0.111.0"
uvicorn = {extras = ["standard"], version = "^0.29.0"}
pydantic = "^2.7.1"
pydantic-settings = "^2.2.1"
httpx = "^0.27.0"

numpy = "^2.0.1"
ffmpeg-python = "^0.2.0"
pydub = "^0.25.1"
[tool.poetry.group.local.dependencies]
rich = "^13.7.1"
pyaudio = "^0.2.14"

[tool.poetry.group.dev.dependencies]
ipykernel = "^6.29.4"

[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"
![image](https://github.com/user-attachments/assets/5950d647-dd5c-4b7e-ac9a-80225760bdc6)

[tool.poetry.scripts]
local-assistant = "app.local_assistant:main"

NOTE: I've not made any changes to frontend & backend except changing mentioned deepgram config.

devsalman247 commented 3 weeks ago

image

ruizguille commented 3 weeks ago

Hi @devsalman247,

Thank you for your message. Is it possible that you forgot to create the .env file in the frontend copying the provided env.example? This file should include the NEXT_PUBLIC_WEBSOCKET_URL environment variable that specifies the WebSocket url to connect to the backend, which by default should be "ws://localhost:8000/listen" if you are running it locally.

I updated it anyway in the code and set that url as a default, so it's no longer necessary to set the env variable if you are running it locally.

Please let me know if this doesn't solve it or you find any other issues

devsalman247 commented 3 weeks ago

Hi @devsalman247,

Thank you for your message. Is it possible that you forgot to create the .env file in the frontend copying the provided env.example? This file should include the NEXT_PUBLIC_WEBSOCKET_URL environment variable that specifies the WebSocket url to connect to the backend, which by default should be "ws://localhost:8000/listen" if you are running it locally.

I updated it anyway in the code and set that url as a default, so it's no longer necessary to set the env variable if you are running it locally.

Please let me know if this doesn't solve it or you find any other issues

Yes, I have created .env on frontend & it is connecting.. image

Moreover, I receive audio bytes on backend but do not receive any response from Deepgram... But when I specify sample_rate & encoding scheme Deepgram returns empty string.

ruizguille commented 3 weeks ago

That's strange. The frontend uses the MediaRecorder Web API to stream the microphone audio and Deepgram should automatically detect the format and encoding. You shouldn't specify it in the connection options. You can learn more here.

Can you check the format/encoding of the microphone stream in the frontend? Maybe it's not supported by Deepgram. Add this at the end of the startMicrophone function to check it:

mediaRecorderRef.current.addEventListener('start', () => {
  console.log(mediaRecorderRef.current.mimeType);
});
ruizguille commented 3 weeks ago

You can also try to set a specific MIME type when creating the MediaRecorder instance and see if it works:

mediaRecorderRef.current = new MediaRecorder(stream, { mimeType: 'audio/webm;codecs=opus' });

This is the format/encoding that my browser is using, and it works well with Deepgram.

devsalman247 commented 3 weeks ago

I've changed the mimetype as you told in previous reply but still no response from deepgram:

async function startMicrophone() {
    const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
    mediaRecorderRef.current = new MediaRecorder(stream, {
      mimeType: "audio/webm;codecs=opus",
    });
    console.log(mediaRecorderRef.current.mimeType);
    mediaRecorderRef.current.addEventListener("dataavailable", (e) => {
      if (e.data.size > 0 && wsRef.current.readyState == WebSocket.OPEN) {
        wsRef.current.send(e.data);
      }
    });
    mediaRecorderRef.current.start(250);
  }

image

ruizguille commented 3 weeks ago

And the Deepgram config options are as they were initially, without specifying the encoding/channels/sample rate?

devsalman247 commented 3 weeks ago

And the Deepgram config options are as they were initially, without specifying the encoding/channels/sample rate?

Yes

ruizguille commented 3 weeks ago

Try this at the end of the transcribe_audio method in the Assistant class:

try:
    audio_file = open('audio.webm', 'wb')
    while not self.finish_event.is_set():
        # Receive audio stream from the client and send it to Deepgram to transcribe it
        data = await self.websocket.receive_bytes()
        audio_file.write(data)
        await dg_connection.send(data)
    audio_file.close()
finally:
    await dg_connection.finish()

Then start the app, talk to the microphone for a bit and listen to the saved audio file in the backend. This way you can see if it's receiving the audio correctly. If it is, then the issue is probably with the Deepgram connection.