gradio-app / gradio

Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
http://www.gradio.app
Apache License 2.0
33.98k stars 2.58k forks source link

Microphone Audio Input bugs #976

Closed Rikorose closed 2 years ago

Rikorose commented 2 years ago

Describe the bug

I struggle to get audio recording input in chrome on hugging space.

Reproduction

  1. gradio requires (undocumented) external programs Unfortunately the required programs are not installed on hugginface spaces and are also not documented. When using:
    gradio.inputs.Audio(source="microphone", type="numpy")

    I get:

    
    Running on local URL:  http://localhost:7860/

To create a public link, set share=True in launch(). /home/user/.local/lib/python3.8/site-packages/pydub/utils.py:198: RuntimeWarning: Couldn't find ffprobe or avprobe - defaulting to ffprobe, but may not work warn("Couldn't find ffprobe or avprobe - defaulting to ffprobe, but may not work", RuntimeWarning) Traceback (most recent call last): File "/home/user/.local/lib/python3.8/site-packages/gradio/routes.py", line 269, in predict output = await run_in_threadpool(app.launchable.process_api, body, username) File "/home/user/.local/lib/python3.8/site-packages/starlette/concurrency.py", line 39, in run_in_threadpool return await anyio.to_thread.run_sync(func, args) File "/home/user/.local/lib/python3.8/site-packages/anyio/to_thread.py", line 28, in run_sync return await get_asynclib().run_sync_in_worker_thread(func, args, cancellable=cancellable, File "/home/user/.local/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 818, in run_sync_in_worker_thread return await future File "/home/user/.local/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 754, in run result = context.run(func, *args) File "/home/user/.local/lib/python3.8/site-packages/gradio/interface.py", line 573, in process_api prediction, durations = self.process(raw_input) File "/home/user/.local/lib/python3.8/site-packages/gradio/interface.py", line 611, in process processed_input = [ File "/home/user/.local/lib/python3.8/site-packages/gradio/interface.py", line 612, in input_component.preprocess(raw_input[i]) File "/home/user/.local/lib/python3.8/site-packages/gradio/inputs.py", line 1173, in preprocess return processing_utils.audio_from_file(file_obj.name) File "/home/user/.local/lib/python3.8/site-packages/gradio/processing_utils.py", line 122, in audio_from_file audio = AudioSegment.from_file(filename) File "/home/user/.local/lib/python3.8/site-packages/pydub/audio_segment.py", line 728, in from_file info = mediainfo_json(orig_file, read_ahead_limit=read_ahead_limit) File "/home/user/.local/lib/python3.8/site-packages/pydub/utils.py", line 274, in mediainfo_json res = Popen(command, stdin=stdin_parameter, stdout=PIPE, stderr=PIPE) File "/usr/local/lib/python3.8/subprocess.py", line 858, in init self._execute_child(args, executable, preexec_fn, close_fds, File "/usr/local/lib/python3.8/subprocess.py", line 1704, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) FileNotFoundError: [Errno 2] No such file or directory: 'ffprobe'

When testing on my local machine (where ffmpeg is installed), two more bugs:
2. Chrome records mono, documentation says result will be of shape (samples, 2)
3. Chrome saves audio as webm/opus, but file extension is still ".wav":
When using
```py
gradio.inputs.Audio(source="microphone", type="filepath")

Which results in the following input (/tmp/audioioc34ntl.wav):

$ ffprobe /tmp/audioioc34ntl.wav
Input #0, matroska,webm, from '/tmp/audioioc34ntl.wav':
  Metadata:
    encoder         : Chrome
  Duration: N/A, start: 0.000000, bitrate: N/A
  Stream #0:0(eng): Audio: opus, 48000 Hz, mono, fltp (default)

Unfortunately without ffmpeg I am not able to decode this audio with python. Torchaudio can only decode ogg/opus.

Screenshot

No response

Logs

No response

System Info

Chrome on huggingface space.
Firefox works.

>>> gradio.__version__
'2.9.1'


### Severity

blocker
abidlabs commented 2 years ago

Hi @Rikorose, thanks for creating this issue. This usually just means that gradio wasn't installed successfully. Try pip uninstall gradio and pip install gradio to reinstall Gradio.

Closing as this has already been addressed in #613 and #195

Rikorose commented 2 years ago

Thanks for your reply. I already tested reinstalling gradio and just tested again with version 2.9.4. Issue is still present and thus I am asking to reopen.

Traceback (most recent call last):
  File "/home/hendrik/miniconda/envs/df/lib/python3.9/site-packages/gradio/routes.py", line 269, in predict
    output = await run_in_threadpool(app.launchable.process_api, body, username)
  File "/home/hendrik/miniconda/envs/df/lib/python3.9/site-packages/starlette/concurrency.py", line 39, in run_in_threadpool
    return await anyio.to_thread.run_sync(func, *args)
  File "/home/hendrik/miniconda/envs/df/lib/python3.9/site-packages/anyio/to_thread.py", line 28, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(func, *args, cancellable=cancellable,
  File "/home/hendrik/miniconda/envs/df/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 818, in run_sync_in_worker_thread
    return await future
  File "/home/hendrik/miniconda/envs/df/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 754, in run
    result = context.run(func, *args)
  File "/home/hendrik/miniconda/envs/df/lib/python3.9/site-packages/gradio/interface.py", line 573, in process_api
    prediction, durations = self.process(raw_input)
  File "/home/hendrik/miniconda/envs/df/lib/python3.9/site-packages/gradio/interface.py", line 615, in process
    predictions, durations = self.run_prediction(
  File "/home/hendrik/miniconda/envs/df/lib/python3.9/site-packages/gradio/interface.py", line 531, in run_prediction
    prediction = predict_fn(*processed_input)
  File "/home/hendrik/projects/DeepFilterNetSpace/app.py", line 110, in mix_and_denoise
    tmp = load_audio_gradio(speech_rec, sr)

Gradio creates the file: speech_rec = '/tmp/audio_hzugp.wav'

$ ll /tmp/audio_hzu_gp_.wav
.rw------- hendrik hendrik 4.5 KB Tue Apr 19 08:38:49 2022  /tmp/audio_hzu_gp_.wav
[I][hendrik@T480s ~]$ soxi /tmp/audio_hzu_gp_.wav
soxi FAIL formats: can't open input file `/tmp/audio_hzu_gp_.wav': WAVE: RIFF header not found

[I][hendrik@T480s ~]$ ffprobe /tmp/audio_hzu_gp_.wav
ffprobe version 4.4.1 Copyright (c) 2007-2021 the FFmpeg developers
  built with gcc 11 (GCC)
  configuration: --prefix=/usr --bindir=/usr/bin --datadir=/usr/share/ffmpeg --docdir=/usr/share/doc/ffmpeg --incdir=/usr/include/ffmpeg --libdir=/usr/lib64 --mandir=/usr/share/man --arch=x86_64 --optflags='-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection' --extra-ldflags='-Wl,-z,relro -Wl,--as-needed -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 ' --extra-cflags=' -I/usr/include/rav1e' --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libvo-amrwbenc --enable-version3 --enable-bzlib --enable-chromaprint --disable-crystalhd --enable-fontconfig --enable-frei0r --enable-gcrypt --enable-gnutls --enable-ladspa --enable-libaom --enable-libdav1d --enable-libass --enable-libbluray --enable-libbs2b --enable-libcdio --enable-libdrm --enable-libjack --enable-libfreetype --enable-libfribidi --enable-libgsm --enable-libilbc --enable-libmp3lame --enable-libmysofa --enable-nvenc --enable-openal --enable-opencl --enable-opengl --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librav1e --enable-librtmp --enable-librubberband --enable-libsmbclient --enable-version3 --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtesseract --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libv4l2 --enable-libvidstab --enable-libvmaf --enable-version3 --enable-vapoursynth --enable-libvpx --enable-vulkan --enable-libglslang --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-libxml2 --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-avfilter --enable-avresample --enable-libmodplug --enable-postproc --enable-pthreads --disable-static --enable-shared --enable-gpl --disable-debug --disable-stripping --shlibdir=/usr/lib64 --enable-lto --enable-libmfx --enable-runtime-cpudetect
  libavutil      56. 70.100 / 56. 70.100
  libavcodec     58.134.100 / 58.134.100
  libavformat    58. 76.100 / 58. 76.100
  libavdevice    58. 13.100 / 58. 13.100
  libavfilter     7.110.100 /  7.110.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  9.100 /  5.  9.100
  libswresample   3.  9.100 /  3.  9.100
  libpostproc    55.  9.100 / 55.  9.100
Input #0, matroska,webm, from '/tmp/audio_hzu_gp_.wav':
  Metadata:
    encoder         : Chrome
  Duration: N/A, start: 0.000000, bitrate: N/A
  Stream #0:0(eng): Audio: opus, 48000 Hz, mono, fltp (default)

Edit: Tested with

chromium-browser --version
Chromium 99.0.4844.84 Fedora Project
abidlabs commented 2 years ago

Whoops I'm sorry I closed the wrong issue. My bad @Rikorose, this should definitely stay open and we will look into it!

omerXfaruq commented 2 years ago

@Rikorose

  1. do you get a warning which is smt like "ffmpeg is not installed"?
  2. You can try conda install ffmpeg
Rikorose commented 2 years ago
  1. No
  2. I have ffmpeg on my local machine, but I cannot install it on hugginface space since it just uses a requirements.txt file.

Also, I'd prefer to not use a hacky workaround for different browsers. I'd rather like to receive an audio sample as specified in the gradio documentation.

I only used ffmpeg to debug the issue and figure out gradio gives me a .wav file that is actually a webm container with opus encoded audio.

osanseviero commented 2 years ago

FYI in Spaces you can add a packages.txt file for dependencies that require apt-get. See https://huggingface.co/spaces/chrisjay/afro-speech/blob/main/packages.txt

osanseviero commented 2 years ago

Notes from offline discussion

Note that ffmpeg cannot be installed via pip install . At least in Ubuntu you need to do apt-get install ffmpeg but also consider that this provides an experience that does not work out of the box for users for a very common feature. Maybe the error message could be better than saying there is no ffprobe

Rikorose commented 2 years ago

FYI in Spaces you can add a packages.txt file for dependencies that require apt-get. See https://huggingface.co/spaces/chrisjay/afro-speech/blob/main/packages.txt

Ah good to know, thanks.

Edit: Works now when installing ffmpeg via packages.txt. I guess the easiest solution would be to document this requirement in the audio input section and maybe automatically transform the webm file to a wav file when using type="filepath".

Thanks for your help!