How to definition the quality of the record

gabrielpondc commented 3 months ago

Hi there, I was using it to make speech to text, but model just could read the record over fp16 media, but i found that the streamlit-audiorecorder transform it as int8 type media and after i using some way transform the media as fp16 type media, the result of the speech to text mission is not good on actually, how could i definition the record quality, thx for response!

theevann commented 3 months ago

Hello,

Here is an example of how to convert from a pydub AudioSegment to a numpy float32 array. It uses spicy's signal library to resample the audio if needed.

  audio_buffer = io.BytesIO()
  audio.export(audio_buffer, format="wav")
  audio_buffer.seek(0)  # Reset buffer position to the start
  audio_buffer.read(44)  # Skip the WAV header (44 bytes)

  audio_array = np.frombuffer(audio_buffer.read(), dtype=np.int16)
  audio_array = audio_array.astype(np.float32) / 32768.0  # Normalize to [-1.0, 1.0]

  # Resample audio to 16000 Hz if necessary
  if audio.frame_rate != 16000:
      num_samples = round(len(audio_array) * 16000 / audio.frame_rate)
      audio_array = signal.resample(audio_array, num_samples)

Here is a complete example of using open-ai whisper with this component:

import io
import numpy as np
import streamlit as st
import whisper
from audiorecorder import audiorecorder
from scipy import signal

st.title("Audio Recorder")
audio = audiorecorder('', '')

if len(audio) > 0:
    # To play audio in frontend:
    st.audio(audio.export().read())  

    # To get audio properties, use pydub AudioSegment properties:
    st.write(f"Frame rate: {audio.frame_rate}, Frame width: {audio.frame_width}, Duration: {audio.duration_seconds} seconds")

    # Prepare the audio data for Whisper model
    audio_buffer = io.BytesIO()
    audio.export(audio_buffer, format="wav")
    audio_buffer.seek(0)  # Reset buffer position to the start
    audio_buffer.read(44)  # Skip the WAV header (44 bytes)

    # Convert audio bytes to a NumPy array
    audio_array = np.frombuffer(audio_buffer.read(), dtype=np.int16)
    audio_array = audio_array.astype(np.float32) / 32768.0  # Normalize to [-1.0, 1.0]

    # Resample audio to 16000 Hz if necessary
    if audio.frame_rate != 16000:
        num_samples = round(len(audio_array) * 16000 / audio.frame_rate)
        audio_array = signal.resample(audio_array, num_samples)

    # Load the Whisper model and transcribe the audio
    model = whisper.load_model("small")
    result = model.transcribe(audio_array)

    # Display the transcription
    st.write(f"Transcription: {result['text']}")

I tested this successfully. Please close the issue if this works for you !

gabrielpondc commented 3 months ago

Thank you for ur response, i have been tried to using the following part to processing the audio file that recorded

    # Prepare the audio data for Whisper model
    audio_buffer = io.BytesIO()
    audio.export(audio_buffer, format="wav")
    audio_buffer.seek(0)  # Reset buffer position to the start
    audio_buffer.read(44)  # Skip the WAV header (44 bytes)

    # Convert audio bytes to a NumPy array
    audio_array = np.frombuffer(audio_buffer.read(), dtype=np.int16)
    audio_array = audio_array.astype(np.float32) / 32768.0  # Normalize to [-1.0, 1.0]

    # Resample audio to 16000 Hz if necessary
    if audio.frame_rate != 16000:
        num_samples = round(len(audio_array) * 16000 / audio.frame_rate)
        audio_array = signal.resample(audio_array, num_samples)

I have saved file that original and processed, I found that the original could play well but the file that has been processed have noise lead to could not make speech to text very well, the way to save processed file is following

from scipy.io import wavfile
wavfile.write("output_revised.wav", 16000, audio_array)

Is there some thing i have mistake to make this not work for me, and then thanks again

gabrielpondc commented 3 months ago

I using other way to generate the text and works, thx for ur response!

theevann / streamlit-audiorecorder

How to definition the quality of the record #23