pytorch / audio

Data manipulation and transformation for audio signal processing, powered by PyTorch
https://pytorch.org/audio
BSD 2-Clause "Simplified" License
2.53k stars 651 forks source link

mp3 load problem #2709

Open magicse opened 2 years ago

magicse commented 2 years ago

🐛 Describe the bug

My code

import os
os.add_dll_directory("C:/ffmpeg/bin")
import torchaudio

audio_path = './123.mp3'
wav, sr = torchaudio.load(audio_path)

Get error raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace')) RuntimeError: Error opening './123.mp3': File contains data in an unknown format.

  File "<stdin>", line 1, in <module>
  File "C:\Python38\lib\site-packages\torchaudio\backend\soundfile_backend.py", line 103, in info
    sinfo = soundfile.info(filepath)
  File "C:\Python38\lib\site-packages\soundfile.py", line 438, in info
    return _SoundFileInfo(file, verbose)
  File "C:\Python38\lib\site-packages\soundfile.py", line 383, in __init__
    with SoundFile(file) as f:
  File "C:\Python38\lib\site-packages\soundfile.py", line 629, in __init__
    self._file = self._open(file, mode_int, closefd)
  File "C:\Python38\lib\site-packages\soundfile.py", line 1183, in _open
    _error_check(_snd.sf_error(file_ptr),
  File "C:\Python38\lib\site-packages\soundfile.py", line 1357, in _error_check
    raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace'))
RuntimeError: Error opening './123.mp3': File contains data in an unknown format.

wavs files loading without any problem. Also info about sound file

>>> import soundfile as sf
>>> sf.available_formats()

{'AIFF': 'AIFF (Apple/SGI)', 'AU': 'AU (Sun/NeXT)', 'AVR': 'AVR (Audio Visual Research)', 'CAF': 'CAF (Apple Core Audio File)', 'FLAC': 'FLAC (Free Lossless Audio Codec)', 'HTK': 'HTK (HMM Tool Kit)', 'SVX': 'IFF (Amiga IFF/SVX8/SV16)', 'MAT4': 'MAT4 (GNU Octave 2.0 / Matlab 4.2)', 'MAT5': 'MAT5 (GNU Octave 2.1 / Matlab 5.0)', 'MPC2K': 'MPC (Akai MPC 2k)', 'OGG': 'OGG (OGG Container format)', 'PAF': 'PAF (Ensoniq PARIS)', 'PVF': 'PVF (Portable Voice Format)', 'RAW': 'RAW (header-less)', 'RF64': 'RF64 (RIFF 64)', 'SD2': 'SD2 (Sound Designer II)', 'SDS': 'SDS (Midi Sample Dump Standard)', 'IRCAM': 'SF (Berkeley/IRCAM/CARL)', 'VOC': 'VOC (Creative Labs)', 'W64': 'W64 (SoundFoundry WAVE 64)', 'WAV': 'WAV (Microsoft)', 'NIST': 'WAV (NIST Sphere)', 'WAVEX': 'WAVEX (Microsoft)', 'WVE': 'WVE (Psion Series 3)', 'XI': 'XI (FastTracker 2)'}

ffmpeg version

ffmpeg version n4.4.2-5-gaa28df74ab-20220923 Copyright (c) 2000-2021 the FFmpeg developers
built with gcc 12.1.0 (crosstool-NG 1.25.0.55_3defb7b)
configuration: --prefix=/ffbuild/prefix --pkg-config-flags=--static --pkg-config=pkg-config --cross-prefix=x86_64-w64-mingw32- --arch=x86_64 --target-os=mingw32
 --enable-gpl --enable-version3 --disable-debug --enable-shared --disable-static --disable-w32threads --enable-pthreads --enable-iconv --enable-libxml2 --enable
-zlib --enable-libfreetype --enable-libfribidi --enable-gmp --enable-lzma --enable-fontconfig --enable-libvorbis --enable-opencl --disable-libpulse --enable-lib
vmaf --disable-libxcb --disable-xlib --enable-amf --enable-libaom --enable-libaribb24 --enable-avisynth --enable-libdav1d --enable-libdavs2 --disable-libfdk-aac
 --enable-ffnvcodec --enable-cuda-llvm --disable-frei0r --enable-libgme --enable-libkvazaar --enable-libass --enable-libbluray --enable-libmp3lame --enable-libo
pus --enable-librist --enable-libssh --enable-libtheora --enable-libvpx --enable-libwebp --enable-lv2 --enable-libmfx --enable-libopencore-amrnb --enable-libope
ncore-amrwb --enable-libopenh264 --enable-libopenjpeg --enable-libopenmpt --enable-librav1e --enable-librubberband --enable-schannel --enable-sdl2 --enable-libs
oxr --enable-libsrt --enable-libsvtav1 --enable-libtwolame --enable-libuavs3d --disable-libdrm --disable-vaapi --enable-libvidstab --disable-vulkan --enable-lib
x264 --enable-libx265 --enable-libxavs2 --enable-libxvid --enable-libzimg --enable-libzvbi --extra-cflags=-DLIBTWOLAME_STATIC --extra-cxxflags= --extra-ldflags=
-pthread --extra-ldexeflags= --extra-libs=-lgomp --extra-version=20220923
libavutil      56. 70.100 / 56. 70.100
libavcodec     58.134.100 / 58.134.100
libavformat    58. 76.100 / 58. 76.100
libavdevice    58. 13.100 / 58. 13.100
libavfilter     7.110.100 /  7.110.100
libswscale      5.  9.100 /  5.  9.100
libswresample   3.  9.100 /  3.  9.100
libpostproc    55.  9.100 / 55.  9.100

Versions

Is CUDA available: False CUDA runtime version: No CUDA GPU models and configuration: No CUD Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

Versions of relevant libraries: [pip3] numpy==1.23.3 [pip3] pytorch2caffe==0.1.0 [pip3] torch==1.12.1 [pip3] torchaudio==0.12.1 [pip3] torchvision==0.13.1 [pip3] torchviz==0.0.2 [conda] Could not collect

magicse commented 2 years ago

After replacing libsndfile64bit.dll in folder C:\Python38\Lib\site-packages_soundfile_data\ with dll file from package libsndfile-1.1.0-win64.zip, mp3 load work well. But I gen another issue. This code give me error

torchaudio.save(fname, np.asfortranarray(wavs[i].squeeze().numpy()), sample_rate)

Error

Traceback (most recent call last):
  File "Z:\AI_SDK\CPP_GFPGAN\Vocal_Spletter\spleeter-pytorch-mnn-main\spleeter-pytorch-mnn-main\test_estimator.py", line 48, in <module>
    torchaudio.save(fname, np.asfortranarray(wavs[i].squeeze().numpy()), sample_rate, channels_first=True)  # save tensor to file, as usual
  File "C:\Python38\lib\site-packages\torchaudio\backend\soundfile_backend.py", line 425, in save
    subtype = _get_subtype(src.dtype, ext, encoding, bits_per_sample)
  File "C:\Python38\lib\site-packages\torchaudio\backend\soundfile_backend.py", line 282, in _get_subtype
    return _get_subtype_for_wav(dtype, encoding, bits_per_sample)
  File "C:\Python38\lib\site-packages\torchaudio\backend\soundfile_backend.py", line 234, in _get_subtype_for_wav
    raise ValueError(f"Unsupported dtype for wav: {dtype}")
ValueError: Unsupported dtype for wav: float32

If I saving wave file like this. It's work well without any issues

soundfile.write(fname, np.asfortranarray(wavs[i].squeeze().numpy()).transpose(), sample_rate)
magicse commented 2 years ago

pull request for mp3 support https://github.com/pytorch/audio/pull/2712