I'm on macOS 13. Python 3.11, ffmpeg 6.1.1, torch and torchaudio 2.2.0. The audio file is almost 21 minutes long and is extracted from this video: https://www.youtube.com/watch?v=8Wdz1Tj5084. I tried both mp3 and wav versions. Your gradio demo errors out on the file too.
This is the full error.
Traceback (most recent call last):
File "/Users/james/denoisers/test.py", line 20, in <module>
clean_chunk = model(audio_chunk[None]).audio
^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/james/.local/share/virtualenvs/denoisers-4WN3pNTX/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/james/.local/share/virtualenvs/denoisers-4WN3pNTX/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/james/.local/share/virtualenvs/denoisers-4WN3pNTX/lib/python3.11/site-packages/denoisers/modeling/waveunet/model.py", line 156, in forward
noise = self.model(inputs)
^^^^^^^^^^^^^^^^^^
File "/Users/james/.local/share/virtualenvs/denoisers-4WN3pNTX/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/james/.local/share/virtualenvs/denoisers-4WN3pNTX/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/james/.local/share/virtualenvs/denoisers-4WN3pNTX/lib/python3.11/site-packages/denoisers/modeling/waveunet/model.py", line 234, in forward
out = self.in_conv(inputs)
^^^^^^^^^^^^^^^^^^^^
File "/Users/james/.local/share/virtualenvs/denoisers-4WN3pNTX/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/james/.local/share/virtualenvs/denoisers-4WN3pNTX/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/james/.local/share/virtualenvs/denoisers-4WN3pNTX/lib/python3.11/site-packages/torch/nn/modules/conv.py", line 310, in forward
return self._conv_forward(input, self.weight, self.bias)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/james/.local/share/virtualenvs/denoisers-4WN3pNTX/lib/python3.11/site-packages/torch/nn/modules/conv.py", line 306, in _conv_forward
return F.conv1d(input, weight, bias, self.stride,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Given groups=1, weight of size [24, 1, 15], expected input[1, 2, 163840] to have 1 channels, but got 2 channels instead
The code is taken from the project README.md:
import torch
import torchaudio
from denoisers import WaveUNetModel
from tqdm import tqdm
model = WaveUNetModel.from_pretrained("wrice/waveunet-vctk-24khz")
audio, sr = torchaudio.load("noisy_audio.wav")
if sr != model.config.sample_rate:
audio = torchaudio.functional.resample(audio, sr, model.config.sample_rate)
chunk_size = model.config.max_length
print(model.config)
padding = abs(audio.size(-1) % chunk_size - chunk_size)
padded = torch.nn.functional.pad(audio, (0, padding))
clean = []
for i in tqdm(range(0, padded.shape[-1], chunk_size)):
audio_chunk = padded[:, i:i + chunk_size]
with torch.no_grad():
clean_chunk = model(audio_chunk[None]).audio
clean.append(clean_chunk.squeeze(0))
denoised = torch.concat(clean, 1)[:, :audio.shape[-1]]
From first glance it looks like the audio is stereo and not mono. Unfortunately, the models are only trained on mono audio, but I can add support for converting stereo audio into mono.
I'm on macOS 13. Python 3.11, ffmpeg 6.1.1, torch and torchaudio 2.2.0. The audio file is almost 21 minutes long and is extracted from this video: https://www.youtube.com/watch?v=8Wdz1Tj5084. I tried both mp3 and wav versions. Your gradio demo errors out on the file too.
This is the full error.
The code is taken from the project README.md: