hugofloresgarcia / torchopenl3

openl3 audio embedding for PyTorch
MIT License
4 stars 1 forks source link

RuntimeError: Input type (torch.cuda.DoubleTensor) and weight type (torch.cuda.FloatTensor) should be the same #1

Open drscotthawley opened 1 year ago

drscotthawley commented 1 year ago

Hugo, thanks for sharing this. When I run the example code provided on the README page, I get a type mismatch error when the embedding line gets called:

/fsx/shawley/envs_sm/aa/lib/python3.10/site-packages/audio_utils/core.py:146: FutureWarning: Pass orig_sr=48000, target_sr=48000 as keyword args. From version 0.10 passing these as positional arguments will result in an error
  audio = librosa.resample(audio, old_sr, new_sr)
Traceback (most recent call last):
  File "/fsx/shawley/code/torchopenl3/openl3_example.py", line 19, in <module>
    embedding = torchopenl3.embed(model=model, 
  File "/fsx/shawley/code/torchopenl3/torchopenl3/embed.py", line 47, in embed
    embeddings = model(audio)
  File "/fsx/shawley/envs_sm/aa/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/fsx/shawley/code/torchopenl3/torchopenl3/model.py", line 54, in forward
    x = self.filters(x)
  File "/fsx/shawley/envs_sm/aa/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/fsx/shawley/code/torchopenl3/torchopenl3/timefreq.py", line 131, in forward
    real = self.conv1d_real(x)
  File "/fsx/shawley/envs_sm/aa/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/fsx/shawley/envs_sm/aa/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 313, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/fsx/shawley/envs_sm/aa/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 309, in _conv_forward
    return F.conv1d(input, weight, bias, self.stride,
RuntimeError: Input type (torch.cuda.DoubleTensor) and weight type (torch.cuda.FloatTensor) should be the same

I think the issue is that the default dtype for numpy arrays is float64 whereas for PyTorch it's float32

Solution: If I change the code to read

audio = np.random.randn(1, SAMPLE_RATE).astype(np.float32)

Then it works fine for me.

drscotthawley commented 1 year ago

Whoops, just noticed this code is 2 years old! Well, it's new to me! :-)