Delayed Microphone Audio Capture

majweldon commented 6 months ago

Describe the bug

Ver 3.48.0 (Desired Behaviour) -As soon as I push stop recording in a microphone input I can push submit (for transcription) with no errors. That is, the file seems usable from the moment stop is pushed.

Ver 4.21.0 -Once I stop a recording, I have to wait some time before the audio 'captures' before I can push submit. This delay is about 1 second for every 10 seconds of recording, so can be substantial for 5+ minutes of audio. I don't mind if there is additional latency, but, ideally, I can push the submit button as soon as I am done recording and come back once everything is done.

Thanks for building and supporting Gradio - it has changed my professional life for the better in a big way.

Mike :)

Have you searched existing issues? 🔎

[X] I have searched and found no existing issues

Reproduction

[Weldon_Full_Visit_Format.txt](https://github.com/gradio-app/gradio/files/14577976/Weldon_Full_Visit_Format.txt)
import os
import openai
import time
from numpy import True_
import gradio as gr
import soundfile as sf
from pydub import AudioSegment

from openai import OpenAI

# Load API key from an environment variable
OPENAI_SECRET_KEY = os.environ.get("OPENAI_SECRET_KEY")
client = OpenAI(api_key = OPENAI_SECRET_KEY)

note_transcript = ""

def transcribe(audio, history_type):
  global note_transcript
  print(f"Received audio file path: {audio}")

  history_type_map = {
      "History": "Weldon_History_Format.txt",
      "Physical": "Weldon_PE_Note_Format.txt",
      "H+P": "Weldon_History_Physical_Format.txt",
      "Impression/Plan": "Weldon_Impression_Note_Format.txt",
      "Handover": "Weldon_Handover_Note_Format.txt",
      "Meds Only": "Medications.txt",
      "EMS": "EMS_Handover_Note_Format.txt",
      "Triage": "Triage_Note_Format.txt",
      "Full Visit": "Weldon_Full_Visit_Format.txt",
      "Psych": "Weldon_Psych_Format.txt",
      "SBAR": "SBAR.txt"

   }
  file_name = history_type_map.get(history_type, "Weldon_Full_Visit_Format.txt")
  with open(f"Format_Library/{file_name}", "r") as f:
    role = f.read()
  messages = [{"role": "system", "content": role}]

  ######################## Read audio file, wait as necessary if not written
  max_attempts = 1
  attempt = 0
  audio_data = None
  samplerate = None
  while attempt < max_attempts:
      try:
          if audio is None:
              raise TypeError("Invalid file: None")
          audio_data, samplerate = sf.read(audio)
          break
      except (OSError, TypeError) as e:
          print(f"Attempt {attempt + 1} of {max_attempts} failed with error: {e}")
          attempt += 1
          time.sleep(3)
  else:
      print(f"###############Failed to open audio file after {max_attempts} attempts.##############")
      return  # Terminate the function or raise an exception if the file could not be opened

  ########## Cast as float 32, normalize
  #audio_data = audio_data.astype("float32")
  #audio_data = (audio_data * 32767).astype("int16")
  #audio_data = audio_data.mean(axis=1)

  ###################Code to convert .wav to .mp3 (if neccesary)
  sf.write("Audio_Files/test.wav", audio_data, samplerate, subtype='PCM_16')
  sound = AudioSegment.from_wav("Audio_Files/test.wav")
  sound.export("Audio_Files/test.mp3", format="mp3")

  sf.write("Audio_Files/test.mp3", audio_data, samplerate)

  ################  Send file to Whisper for Transcription
  audio_file = open("Audio_Files/test.mp3", "rb")

  max_attempts = 3
  attempt = 0
  while attempt < max_attempts:
      try:
          audio_transcript = client.audio.transcriptions.create(model="whisper-1", file=audio_file)
          break
      except openai.error.APIConnectionError as e:
          print(f"Attempt {attempt + 1} failed with error: {e}")
          attempt += 1
          time.sleep(3) # wait for 3 seconds before retrying
  else:
      print("Failed to transcribe audio after multiple attempts")  

  print(audio_transcript.text)
  messages.append({"role": "user", "content": audio_transcript.text})

  #Create Sample Dialogue Transcript from File (for debugging)
  #with open('Audio_Files/Test_Elbow.txt', 'r') as file:
  #  audio_transcript = file.read()
  #messages.append({"role": "user", "content": audio_transcript})

  ### Word and MB Count
  file_size = os.path.getsize("Audio_Files/test.mp3")
  mp3_megabytes = file_size / (1024 * 1024)
  mp3_megabytes = round(mp3_megabytes, 2)

  audio_transcript_words = audio_transcript.text.split() # Use when using mic input
  #audio_transcript_words = audio_transcript.split() #Use when using file

  num_words = len(audio_transcript_words)

  #Ask OpenAI to create note transcript
  response = client.chat.completions.create(model="gpt-4-1106-preview", temperature=0, messages=messages)
  note_transcript = response.choices[0].message.content
  print(note_transcript) 
  return [note_transcript, num_words,mp3_megabytes]

#Define Gradio Interface
my_inputs = [
    gr.Audio(sources=["microphone"], type="filepath",format="mp3"),
    gr.Radio(["History","H+P","Impression/Plan","Full Visit","Handover","Psych","EMS","SBAR","Meds Only"], show_label=False),
]

ui = gr.Interface(fn=transcribe, 
                  inputs=my_inputs, 
                  outputs=[gr.Textbox(label="Your Note", show_copy_button=True),
                           gr.Number(label="Audio Word Count"),
                           gr.Number(label=".mp3 MB")]
                 )

ui.launch(share=False, debug=True)

Screenshot

No response

Logs

Attempt 1 of 1 failed with error: Invalid file: None
###############Failed to open audio file after 1 attempts.##############
Traceback (most recent call last):
  File "/home/user/.local/lib/python3.9/site-packages/gradio/queueing.py", line 495, in call_prediction
    output = await route_utils.call_process_api(
  File "/home/user/.local/lib/python3.9/site-packages/gradio/route_utils.py", line 232, in call_process_api
    output = await app.get_blocks().process_api(
  File "/home/user/.local/lib/python3.9/site-packages/gradio/blocks.py", line 1570, in process_api
    data = self.postprocess_data(fn_index, result["prediction"], state)
  File "/home/user/.local/lib/python3.9/site-packages/gradio/blocks.py", line 1397, in postprocess_data
    self.validate_outputs(fn_index, predictions)  # type: ignore
  File "/home/user/.local/lib/python3.9/site-packages/gradio/blocks.py", line 1371, in validate_outputs
    raise ValueError(
ValueError: An event handler (transcribe) didn't receive enough output values (needed: 3, received: 1).
Wanted outputs:
    [textbox, number, number]
Received outputs:
    [None]

System Info

Gradio Environment Information:
------------------------------
Operating System: Linux
gradio version: 4.14.0
gradio_client version: 0.8.0

------------------------------------------------
gradio dependencies in your environment:

aiofiles: 23.2.1
altair: 5.2.0
fastapi: 0.110.0
ffmpy: 0.3.2
gradio-client==0.8.0 is not installed.
httpx: 0.27.0
huggingface-hub: 0.19.4
importlib-resources: 6.1.3
jinja2: 3.1.3
markupsafe: 2.1.5
matplotlib: 3.8.3
numpy: 1.26.2
orjson: 3.9.15
packaging: 23.2
pandas: 2.1.3
pillow: 10.2.0
pydantic: 2.6.4
pydub: 0.25.1
python-multipart: 0.0.9
pyyaml: 6.0.1
semantic-version: 2.10.0
tomlkit==0.12.0 is not installed.
typer: 0.9.0
typing-extensions: 4.8.0
uvicorn: 0.28.0
authlib; extra == 'oauth' is not installed.
itsdangerous; extra == 'oauth' is not installed.

gradio_client dependencies in your environment:

fsspec: 2023.10.0
httpx: 0.27.0
huggingface-hub: 0.19.4
packaging: 23.2
typing-extensions: 4.8.0
websockets: 11.0.3

Running on local URL:  http://0.0.0.0:7860

To create a public link, set `share=True` in `launch()`.

Severity

I can work around it

abidlabs commented 6 months ago

Thanks @majweldon for the kind words! cc @hannahblair and @dawoodkhan82 as well as this relates to frontend validation

aliabid94 commented 5 months ago

taking a look

aliabid94 commented 5 months ago

Okay so I haven't exactly been able to reproduce the extent of lag that you describe. If I record a very long audio (~5 min), I do encounter two sources of lag:

Generating the waveform in the browser (this is new to gradio 4.x.). However on my Macbook pro, this only takes ~2s.
Processing the file for saving in the backend. This can take 5-6 seconds. However, this is identical in gradio 4.x and 3.42, so I'm not sure why you wouldn't see this in 3.42.

I made a PR that improves the performance of (2) - it only works though if the recorded audio format is "wav". Can you try installing the gradio from this PR. So do the following:

pip install https://gradio-builds.s3.amazonaws.com/b35e3ae839d208520180299077f4ce57bb96fca4/gradio-4.25.0-py3-none-any.whl
Change your audio component to gr.Audio(sources=["microphone"], type="filepath",format="wav")

See if you notice a difference in performance and lmk.

majweldon commented 5 months ago

Thank you so much for your time and effort @abidlabs.

I have done as you said (with the url pasted into my requirements.txt), and am using the .wav format. It builds and runs with the PR library, but I still have a significant lag (about 12 seconds per minute of recorded audio) before I can process any audio data.

I have attached the error log for reference. Here, I press the submit button once before the audio captures and once afterwards. I can tell the audio captures because the waveform in the gradio interface will refresh, though it is visibly the same waveform. Post capture, there is a valid audio path passed to my transcribe function which is missing in the pre-capture.

Mike :)

On Tue, 2 Apr 2024 at 16:59, aliabid94 @.***> wrote:

Okay so I haven't exactly been able to reproduce the extent of lag that you describe. If I record a very long audio (~5 min), I do encounter two sources of lag:

Generating the waveform in the browser (this is new to gradio 4.x.). However on my Macbook pro, this only takes ~2s.

Processing the file for saving in the backend. This can take 5-6 seconds. However, this is identical in gradio 4.x and 3.42, so I'm not sure why you wouldn't see this in 3.42.

I made a PR that improves the performance of (2) - it only works though if the recorded audio format is "wav". Can you try installing the gradio from this PR https://github.com/gradio-app/gradio/pull/7917. So do the following:

pip install https://gradio-builds.s3.amazonaws.com/b35e3ae839d208520180299077f4ce57bb96fca4/gradio-4.25.0-py3-none-any.whl

Change your audio component to gr.Audio(sources=["microphone"], type="filepath",format="wav")

See if you notice a difference in performance and lmk.

— Reply to this email directly, view it on GitHub https://github.com/gradio-app/gradio/issues/7681#issuecomment-2033253023, or unsubscribe https://github.com/notifications/unsubscribe-auth/BFNOKXYDO26O3REDJSPGAF3Y3MZ4VAVCNFSM6AAAAABES4R7XSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZTGI2TGMBSGM . You are receiving this because you were mentioned.Message ID: @.***>


===== Application Startup at 2024-04-05 15:49:34 =====

Running on local URL:  http://0.0.0.0:7860

To create a public link, set `share=True` in `launch()`.
Received audio file path: None
Attempt 1 of 1 failed with error: Invalid file: None
###############Failed to open audio file after 1 attempts.##############
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/gradio/queueing.py", line 522, in process_events
    response = await route_utils.call_process_api(
  File "/usr/local/lib/python3.9/site-packages/gradio/route_utils.py", line 260, in call_process_api
    output = await app.get_blocks().process_api(
  File "/usr/local/lib/python3.9/site-packages/gradio/blocks.py", line 1750, in process_api
    data = self.postprocess_data(fn_index, result["prediction"], state)
  File "/usr/local/lib/python3.9/site-packages/gradio/blocks.py", line 1521, in postprocess_data
    self.validate_outputs(fn_index, predictions)  # type: ignore
  File "/usr/local/lib/python3.9/site-packages/gradio/blocks.py", line 1495, in validate_outputs
    raise ValueError(
ValueError: An event handler (transcribe) didn't receive enough output values (needed: 3, received: 1).
Wanted outputs:
    [<gradio.components.textbox.Textbox object at 0x7f73f7c80220>, <gradio.components.number.Number object at 0x7f73f7c80340>, <gradio.components.number.Number object at 0x7f73f7c80490>]
Received outputs:
    [None]

**** After the lag for audio capture, I push re-submit and the error is gone

Received audio file path: /tmp/gradio/be60e81248568cc78a52a8bd6c9accaa3fdc6193/audio.wav Dear fellow scholars, the medications are Tylenol, Metoprolol, and Aspirin. What a time to be alive! Medications:

Acetaminophen
Metoprolol
Aspirin

aliabid94 commented 5 months ago

Did the PR make any difference at all? If you're still seeing that much lag when the processing time should have been cut, then perhaps its a network issue? Are you running your demo locally or over a server?

majweldon commented 5 months ago

I didn't see any difference with the PR, unfortunately.

My demo is running on the hugging face server, and I see similar behaviour at work, at home, and on my mobile device.

Would network issues affect latency differently between the libraries?

I can see the waveform and playback the audio within 5-6 seconds in both versions, similar to what you report. I just can't pass the audio to my function (transcribe) for much longer using the 4.x versions - it seems to have to wait until audio.wav is written to disk and can be passed in as a filepath.

Thanks again, Mike :)

On Fri, 5 Apr 2024 at 14:35, aliabid94 @.***> wrote:

Did the PR make any difference at all? If you're still seeing that much lag when the processing time should have been cut, then perhaps its a network issue? Are you running your demo locally or over a server?

— Reply to this email directly, view it on GitHub https://github.com/gradio-app/gradio/issues/7681#issuecomment-2040592064, or unsubscribe https://github.com/notifications/unsubscribe-auth/BFNOKX6YFTL5DLEFNDSMNE3Y34DI3AVCNFSM6AAAAABES4R7XSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBQGU4TEMBWGQ . You are receiving this because you were mentioned.Message ID: @.***>

alexeygridnev commented 1 month ago

I'm facing the same problem in gradio 4.29.0

gradio-app / gradio