Cannnot create the MFCC of a tensor that is already on a gpu

Greenscreen23 commented 8 months ago

🚀 The feature

The MFCC class should have a parameter for specifying a device for internally used tensors, so one can calculate a MFCC of a tensor that is not on the cpu.

Motivation, pitch

Trying to calculate a MFCC of a tensor that is not on the cpu fails because internally used tensors are on the cpu and operations (for example matrix multiplication) fail when tensors are on different devices. An example is:

import torch
from torchaudio.transforms import MFCC

device = torch.device('cuda')
mfcc = MFCC(melkwargs={ "wkwargs": { "device": device } })

data = torch.rand([400, 400]).to(device)

print(mfcc(data))

To be able to compute the MFCC of a tensor that is on a gpu, one has to add these two lines after the creation of the mfcc object, which is not good, as one has to access torchaudio internals.

mfcc.MelSpectrogram.mel_scale.fb = mfcc.MelSpectrogram.mel_scale.fb.to(device)
mfcc.dct_mat = mfcc.dct_mat.to(device)

It would be great if this was possible without accessing the internals. I’d be happy to open a PR to fix this:

The way I would solve this issue in my PR would be to create a parameter device for the MFCC class, a parameter mel_scale_kwargs for the MelSpectrogram class and a parameter device for the MelScale class. Passing a device argument would transfer the dct_mat to that device, and pass this device to the spectrogram window function (melkwargs > wkwargs > device) and the mel scale (melkwargs > mel_scale_kwargs > device), which transfer their respective tensors to the device.

Alternatives

One could create a to(device) function for the different classes mentioned, which takes care of transferring all internal tensors to that device. This would allow moving the tensors multiple times, but would require a lot more code.

Additional context

No response

Greenscreen23 commented 8 months ago

I was also told to ping @mthrok see here

I'm happy about any feedback / critique :)

mthrok commented 8 months ago

Hi

The expected usage is to move the whole transform to the target device. Is there any reason you cannot do that?

This works.

device = torch.device('cuda')
mfcc = MFCC()
mfcc.to(device)

data = torch.rand([400, 400]).to(device)

print(mfcc(data))

Tested on Colab

Screenshot 2024-03-20 at 2 53 59 PM

Greenscreen23 commented 8 months ago

Oops. I thought I tried that. You are completely right, thank you :)

pytorch / audio