yongjer commented 7 months ago

Describe the bug

sometimes it can do asr successfully, but sometimes the error occurs, not stable enough, now the solution is to submit multi times, if success, it shows Due to a bug fix in https://github.com/huggingface/transformers/pull/28687 transcription using a multilingual Whisper will default to language detection followed by transcription instead of translation to English.This might be a breaking change for your use case. If you want to instead always translate your audio to English, make sure to pass `language='en'`. in the terminal as below

Have you searched existing issues? 🔎

[X] I have searched and found no existing issues

Reproduction

import gradio as gr
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
import numpy as np
import torch

device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float32  # Always use float32

model_id = "openai/whisper-large-v3"

model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, use_safetensors=True
)

model = torch.compile(model, mode = "max-autotune", fullgraph=True)

model.to(device)

processor = AutoProcessor.from_pretrained(model_id)

transcriber = pipeline("automatic-speech-recognition", model=model, tokenizer=processor.tokenizer, feature_extractor=processor.feature_extractor, torch_dtype=torch_dtype)

def transcribe(audio):
    sr, y = audio
    y = y.astype(np.float16)
    y /= np.max(np.abs(y))

    return transcriber({"sampling_rate": sr, "raw": y})["text"]

demo = gr.Interface(
    transcribe,
    gr.Audio(sources=["microphone"]),
    "text",
)

demo.launch(share=True, auth=("test", "nsysu2024"))

Screenshot

No response

Logs

Traceback (most recent call last):
  File "/home/yongjer/miniforge3/envs/gradio/lib/python3.11/site-packages/gradio/queueing.py", line 495, in call_prediction
    output = await route_utils.call_process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yongjer/miniforge3/envs/gradio/lib/python3.11/site-packages/gradio/route_utils.py", line 235, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yongjer/miniforge3/envs/gradio/lib/python3.11/site-packages/gradio/blocks.py", line 1627, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yongjer/miniforge3/envs/gradio/lib/python3.11/site-packages/gradio/blocks.py", line 1173, in call_function
    prediction = await anyio.to_thread.run_sync(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yongjer/miniforge3/envs/gradio/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yongjer/miniforge3/envs/gradio/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2144, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/home/yongjer/miniforge3/envs/gradio/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 851, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yongjer/miniforge3/envs/gradio/lib/python3.11/site-packages/gradio/utils.py", line 690, in wrapper
    response = f(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^
  File "/home/yongjer/程式/hf/gr.py", line 25, in transcribe
    sr, y = audio
    ^^^^^
TypeError: cannot unpack non-iterable NoneType object
Due to a bug fix in https://github.com/huggingface/transformers/pull/28687 transcription using a multilingual Whisper will default to language detection followed by transcription instead of translation to English.This might be a breaking change for your use case. If you want to instead always translate your audio to English, make sure to pass `language='en'`.

System Info

Gradio Environment Information:
------------------------------
Operating System: Linux
gradio version: 4.19.2
gradio_client version: 0.10.1

------------------------------------------------
gradio dependencies in your environment:

aiofiles: 23.2.1
altair: 5.2.0
fastapi: 0.110.0
ffmpy: 0.3.2
gradio-client==0.10.1 is not installed.
httpx: 0.27.0
huggingface-hub: 0.21.3
importlib-resources: 6.1.2
jinja2: 3.1.3
markupsafe: 2.1.5
matplotlib: 3.8.3
numpy: 1.26.4
orjson: 3.9.15
packaging: 23.2
pandas: 2.2.1
pillow: 10.2.0
pydantic: 2.6.3
pydub: 0.25.1
python-multipart: 0.0.9
pyyaml: 6.0.1
ruff: 0.3.0
semantic-version: 2.10.0
tomlkit==0.12.0 is not installed.
typer: 0.9.0
typing-extensions: 4.10.0
uvicorn: 0.27.1
authlib; extra == 'oauth' is not installed.
itsdangerous; extra == 'oauth' is not installed.

gradio_client dependencies in your environment:

fsspec: 2024.2.0
httpx: 0.27.0
huggingface-hub: 0.21.3
packaging: 23.2
typing-extensions: 4.10.0
websockets: 11.0.3

Severity

I can work around it

abidlabs commented 7 months ago

Hi @yongjer it sounds like no audio is being recorded in those cases. Are you sure that you're capturing audio? You could of course handle the None case but it may not resolve the underlying issue, if there is one.

def transcribe(audio):
    if audio is None:
       return None
    sr, y = audio
    y = y.astype(np.float16)
    y /= np.max(np.abs(y))

    return transcriber({"sampling_rate": sr, "raw": y})["text"]

abidlabs commented 7 months ago

Actually I tested this and can reproduce this issue here: https://huggingface.co/spaces/abidlabs/whisper. Not sure what's going on, will look into it

fusesid commented 6 months ago

@abidlabs any update on this. Actually, this same code works on my local but not sure why not working on ec2 instance or google colab environment.

fusesid commented 6 months ago

7674

abidlabs commented 6 months ago

Duplicate issue here: https://github.com/gradio-app/gradio/issues/7841. Let me close this one in favor of that one, which has a simpler repro and more details

gradio-app / gradio

TypeError: cannot unpack non-iterable NoneType object #7582