pytorch / audio

Data manipulation and transformation for audio signal processing, powered by PyTorch
https://pytorch.org/audio
BSD 2-Clause "Simplified" License
2.43k stars 635 forks source link

Cannot load audio from pathlib.Path #3775

Open roedoejet opened 2 months ago

roedoejet commented 2 months ago

🐛 Describe the bug

Running the following:

import torchaudio
from pathlib import Path

test_audio_path = Path('test.wav')
torchaudio.load(test_audio_path)

Produces the following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/pinea/miniconda3/envs/EveryVoice/lib/python3.10/site-packages/torchaudio/_backend/utils.py", line 203, in load
    return backend.load(uri, frame_offset, num_frames, normalize, channels_first, format, buffer_size)
  File "/Users/pinea/miniconda3/envs/EveryVoice/lib/python3.10/site-packages/torchaudio/_backend/sox.py", line 41, in load
    ret = torch.ops.torchaudio.sox_io_load_audio_file(
  File "/Users/pinea/miniconda3/envs/EveryVoice/lib/python3.10/site-packages/torch/_ops.py", line 692, in __call__
    return self._op(*args, **kwargs or {})
RuntimeError: torchaudio::sox_io_load_audio_file() Expected a value of type 'str' for argument '_0' but instead found type 'PosixPath'.
Position: 0
Value: PosixPath('test.wav')
Declaration: torchaudio::sox_io_load_audio_file(str _0, int? _1, int? _2, bool? _3, bool? _4, str? _5) -> (Tensor _0, int _1)
Cast error details: Unable to cast Python instance of type <class 'pathlib.PosixPath'> to C++ type '?' (#define PYBIND11_DETAILED_ERROR_MESSAGES or compile in debug mode for details)

But my understanding from the torchaudio.load type signature is that a pathlib.Path should be accepted here.

Versions

PyTorch version: 2.1.0 Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A

OS: macOS 12.5.1 (arm64) GCC version: Could not collect Clang version: 14.0.0 (clang-1400.0.29.202) CMake version: Could not collect Libc version: N/A

Python version: 3.10.13 (main, Sep 11 2023, 08:16:02) [Clang 14.0.6 ] (64-bit runtime) Python platform: macOS-12.5.1-arm64-arm-64bit Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Apple M1 Pro

Versions of relevant libraries: [pip3] flake8==7.0.0 [pip3] mypy==1.8.0 [pip3] mypy-extensions==1.0.0 [pip3] numpy==1.26.4 [pip3] pytorch-lightning==2.2.0.post0 [pip3] torch==2.1.0 [pip3] torchaudio==2.1.0 [pip3] torchinfo==1.8.0 [pip3] torchmetrics==1.3.1 [conda] numpy 1.26.4 pypi_0 pypi [conda] pytorch-lightning 2.2.0.post0 pypi_0 pypi [conda] torch 2.1.0 pypi_0 pypi [conda] torchaudio 2.1.0 pypi_0 pypi [conda] torchinfo 1.8.0 pypi_0 pypi [conda] torchmetrics 1.3.1 pypi_0 pypi

yoyololicon commented 2 months ago

Hi @roedoejet,

The sox backend does not support file-like objects, which has been stated in the documentation. https://pytorch.org/audio/2.1.0/torchaudio.html Would recommend you use either ffmpeg or soundfile as the backend.