Open Tungway1990 opened 1 year ago
Hi @Tungway1990
Thanks for the report. This issue is originated from soundfile package.
I can reproduce this with bare soundfile.
data, samplerate = sf.read(source)
print(data.shape, data.dtype)
data, samplerate = sf.read(source, dtype="float32")
print(data.shape, data.dtype)
(200704,) float64
(0,) float32
A similar issue is already reported, and the root cause seems to be libsndfile. https://github.com/bastibe/python-soundfile/issues/349
We could make special treatment for mp3, and load it as float64 once, then convert it to float32 if necessary. @pytorch/team-audio-core Any thoughts?
Yup, I agree with you this is the third party issue
Hello guys, I'm interested in contributing to PyTorch. I am learning python. Is there any way I can contribute to the project ? Could anyone please guide me?
We need to add special handling to MP3 so that it's loaded as dtype64 first, then converted to the one required by the client code.
It's somewhere here
This is my own implementation, you can take a look
with soundfile.SoundFile(filepath, "r") as file_:
if file_.format != "WAV" or normalize:
dtype = "float64"
elif file_.subtype not in _SUBTYPE2DTYPE:
raise ValueError(f"Unsupported subtype: {file_.subtype}")
else:
dtype = _SUBTYPE2DTYPE[file_.subtype]
frames = file_._prepare_read(frame_offset, None, num_frames)
waveform = file_.read(frames, dtype, always_2d=True).astype('float32')
🐛 Describe the bug
I am trying to load commonvoice mp3 files using torchaudio with below code:
I get an empty output:
I find the root cause in file soundfile_backend.py
by changing float32 to float64, the array can be generated
Attached a mp3 file for your reference common_voice_zh-HK_20096730.zip
The ffmpeg version I am using is 5.1.2.
Thanks.
Versions
Collecting environment information... PyTorch version: 1.12.0 Is debug build: False CUDA used to build PyTorch: 11.6 ROCM used to build PyTorch: N/A
OS: Microsoft Windows 10 Pro GCC version: Could not collect Clang version: Could not collect CMake version: version 3.24.0-rc3 Libc version: N/A
Python version: 3.9.12 (main, Apr 4 2022, 05:22:27) [MSC v.1916 64 bit (AMD64)] (64-bit runtime) Python platform: Windows-10-10.0.19045-SP0 Is CUDA available: True CUDA runtime version: 11.6.124 CUDA_MODULE_LOADING set to: GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3090 Ti Nvidia driver version: 516.94 cuDNN version: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\bin\cudnn_ops_train64_8.dll HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True
Versions of relevant libraries: [pip3] mypy-extensions==0.4.3 [pip3] numpy==1.23.3 [pip3] numpydoc==1.2 [pip3] pytorchvideo==0.1.5 [pip3] torch==1.12.0 [pip3] torch-geometric==2.0.4 [pip3] torch-geometric-temporal==0.53.0 [pip3] torch-scatter==2.0.9 [pip3] torch-sparse==0.6.13 [pip3] torchaudio==0.12.0 [pip3] torchfile==0.1.0 [pip3] torchvision==0.13.0 [conda] Could not collect