gradio output are blank

yongjer commented 3 weeks ago

Describe the bug

the first tab Microphone works, but the "Full Demo" tab is not working

Have you searched existing issues? 🔎

[X] I have searched and found no existing issues

Reproduction

'''
Loaded as API: http://localhost:7861/ ✔
Client.predict() Usage Info
---------------------------
Named API endpoints: 1

 - predict(param_0, param_1, api_name="/predict") -> similarity_scores
    Parameters:
     - [Textbox] param_0: str (required)  
     - [Textbox] param_1: str (required)  
    Returns:
     - [Json] similarity_scores: Dict[Any, Any] (any valid json)
'''
'''
Loaded as API: http://localhost:7860/ ✔
Client.predict() Usage Info
---------------------------
Named API endpoints: 3

 - predict(inputs, task, api_name="/predict") -> output
    Parameters:
     - [Audio] inputs: filepath (required)  
     - [Radio] task: Literal['transcribe', 'translate'] (not required, defaults to:   transcribe)  
    Returns:
     - [Textbox] output: str 

 - predict(inputs, task, api_name="/predict_1") -> output
    Parameters:
     - [Audio] inputs: filepath (required)  
     - [Radio] task: Literal['transcribe', 'translate'] (not required, defaults to:   transcribe)  
    Returns:
     - [Textbox] output: str 

 - predict(yt_url, task, api_name="/predict_2") -> (output_0, output_1)
    Parameters:
     - [Textbox] yt_url: str (required)  
     - [Radio] task: Literal['transcribe', 'translate'] (not required, defaults to:   transcribe)  
    Returns:
     - [Html] output_0: str 
     - [Textbox] output_1: str
'''

import gradio as gr
from gradio_client import Client

MODEL_NAME = "openai/whisper-large-v3"
BATCH_SIZE = 8
FILE_LIMIT_MB = 1000
WHISPER_SERVER_PORT = "http://localhost:7860"
TEXT_EMBEDDING_SERVER_PORT = "http://localhost:7861"
MOVEMENT = ["forward", "backward", "left", "right", "up", "down", "stop"]
TIME = ["do not move", "one second", "two seconds", "three seconds", "four seconds", "five seconds", "six seconds", "seven seconds", "eight seconds", "nine seconds", "ten seconds"]

def transcribe(inputs, task):
    if inputs is None:
        raise gr.Error(
            "No audio file submitted! Please upload or record an audio file before submitting your request."
        )
    try:
        asr_client = Client(WHISPER_SERVER_PORT)
        result = asr_client.predict(inputs=inputs, task=task, api_name="/predict")
        print(result)
        return str(result)
    except Exception as e:
        raise gr.Error(f"An error occurred during transcription: {str(e)}")

def movement_classification(text):
    if text is None:
        raise gr.Error("No text submitted! Please provide text for classification.")

    try:
        movement_client = Client(TEXT_EMBEDDING_SERVER_PORT)
        result = movement_client.predict(param_0=text, param_1="\n".join(MOVEMENT), api_name="/predict")
        index = result.index(max(result))  # find the index of the highest value
        movement = MOVEMENT[index]
        return movement
    except Exception as e:
        raise gr.Error(f"An error occurred during movement classification: {str(e)}")

def time_classification(text):
    if text is None:
        raise gr.Error("No text submitted! Please provide text for classification.")

    try:
        time_client = Client(TEXT_EMBEDDING_SERVER_PORT)
        result = time_client.predict(param_0=text, param_1="\n".join(TIME), api_name="/predict")
        index = result.index(max(result))  # find the index of the highest value
        time = TIME[index]
        return time
    except Exception as e:
        raise gr.Error(f"An error occurred during time classification: {str(e)}")

def determine_movement_and_time(inputs, task):
    text = transcribe(inputs, task)
    print(text)
    movement = movement_classification(text)
    time = time_classification(text)
    return text, movement, time

demo = gr.Blocks()

full_demo = gr.Interface(
    fn=determine_movement_and_time,
    inputs=[
        gr.Audio(sources=["microphone"], type="filepath", label="Audio Input"),
        gr.Radio(["transcribe", "translate"], label="Task", value="transcribe"),
    ],
    outputs=["text", "text", "text"],
    title="Whisper Large V3: Movement and Time Classification",
    description=f"Transcribe long-form microphone or audio inputs with the click of a button! Demo uses the checkpoint [{MODEL_NAME}](https://huggingface.co/{MODEL_NAME}) and 🤗 Transformers to transcribe audio files of arbitrary length.",
    allow_flagging="never",
)

mf_transcribe = gr.Interface(
    fn=transcribe,
    inputs=[
        gr.Audio(sources="microphone", type="filepath", label="Microphone Input"),
        gr.Radio(["transcribe", "translate"], label="Task", value="transcribe"),
    ],
    outputs="text",
    title="Whisper Large V3: Transcribe Audio",
    description=f"Transcribe long-form microphone or audio inputs with the click of a button! Demo uses the checkpoint [{MODEL_NAME}](https://huggingface.co/{MODEL_NAME}) and 🤗 Transformers to transcribe audio files of arbitrary length.",
    allow_flagging="never",
)

file_transcribe = gr.Interface(
    fn=transcribe,
    inputs=[
        gr.Audio(sources="upload", type="filepath", label="Audio File Upload"),
        gr.Radio(["transcribe", "translate"], label="Task", value="transcribe"),
    ],
    outputs="text",
    title="Whisper Large V3: Transcribe Audio",
    description=f"Transcribe long-form microphone or audio inputs with the click of a button! Demo uses the checkpoint [{MODEL_NAME}](https://huggingface.co/{MODEL_NAME}) and 🤗 Transformers to transcribe audio files of arbitrary length.",
    allow_flagging="never",
)

with demo:
    gr.TabbedInterface([mf_transcribe, file_transcribe, full_demo], ["Microphone", "Audio File", "Full Demo"])

demo.queue()
demo.launch(server_port=7862)

Screenshot

No response

Logs

No response

System Info

Gradio Environment Information:
------------------------------
Operating System: Linux
gradio version: 4.32.1
gradio_client version: 0.17.0

------------------------------------------------
gradio dependencies in your environment:

aiofiles: 23.2.1
altair: 5.3.0
fastapi: 0.111.0
ffmpy: 0.3.2
gradio-client==0.17.0 is not installed.
httpx: 0.27.0
huggingface-hub: 0.23.2
importlib-resources: 6.4.0
jinja2: 3.1.4
markupsafe: 2.1.5
matplotlib: 3.9.0
numpy: 1.26.4
orjson: 3.10.3
packaging: 24.0
pandas: 2.2.2
pillow: 10.3.0
pydantic: 2.7.2
pydub: 0.25.1
python-multipart: 0.0.9
pyyaml: 6.0.1
ruff: 0.4.7
semantic-version: 2.10.0
tomlkit==0.12.0 is not installed.
typer: 0.12.3
typing-extensions: 4.12.0
urllib3: 2.2.1
uvicorn: 0.30.0
authlib; extra == 'oauth' is not installed.
itsdangerous; extra == 'oauth' is not installed.

gradio_client dependencies in your environment:

fsspec: 2024.5.0
httpx: 0.27.0
huggingface-hub: 0.23.2
packaging: 24.0
typing-extensions: 4.12.0
websockets: 11.0.3

Severity

Blocking usage of gradio

abidlabs commented 3 weeks ago

Hi @yongjer I can't repro this issue:

https://github.com/gradio-app/gradio/assets/1778297/2fa81d2f-776b-4ba9-a5f5-f597ac83b6fb

Could it be an issue that's specific to your environment? Have you tried running on Hugging Face Spaces or Google Colab for example?

abidlabs commented 1 week ago

Closing for lack of a suitable repro

gradio-app / gradio

gradio output are blank #8441