mir-aidj / all-in-one

All-In-One Music Structure Analyzer
http://arxiv.org/abs/2307.16425
MIT License
370 stars 35 forks source link

MPS Alternatives to Natten #17

Open filtercodes opened 3 weeks ago

filtercodes commented 3 weeks ago

For mac users analysing audio files takes a really long time because it's all done on CPU without utilising Metal acceleration. Is there a way to provide alternative kernels for these tasks that have sliding window self-attention, or equivalent algorithm but also have kernel backend compiled for MPS?

tae-jun commented 3 weeks ago

Hi, the most of computing time is from source separation (Demucs), so Demucs should support mps.

However, I did a bit search and I guess it's not possible for now sadly 😢 https://github.com/facebookresearch/demucs/issues/432

filtercodes commented 3 weeks ago

Hi @tae-jun, thanks for clarification. The issue seems to be that mps doesn't support complex number operations or any other float than float32... and then it's a matter of finding a right spot and using .to("cpu") function to drag back processing of that particular math operation back to cpu.

https://github.com/facebookresearch/demucs/blob/main/demucs/htdemucs.py#L628C1-L634C24

# to cpu as mps doesnt support complex numbers
        # demucs issue #435 ##432
        # NOTE: in this case z already is on cpu
        # TODO: remove this when mps supports complex numbers
        x_is_mps = x.device.type == "mps"
        if x_is_mps:
        x = x.cpu()

and then

https://github.com/facebookresearch/demucs/blob/main/demucs/htdemucs.py#L645

# back to mps device
    if x_is_mps:
    x = x.to("mps")

But then we still have cpu doing most of the work. The real solution would be to not use complex numbers at all... if the algorithm can be adopted to use only real number like for example FFT, can be done with or without complex numbers.

Are there any other source separation alternatives that we could use instead of Demucs?

tae-jun commented 2 weeks ago

There are many publicly available source separation tools nowadays, such as Spleeter.

However, I have not tested all-in-one on other source separation models, and since all-in-one is trained on outputs of Demucs, I can't guarantee its performance.

But I think it's worth a try!

filtercodes commented 1 week ago

I found this one with MPS support

https://github.com/karaokenerds/python-audio-separator

It seems like with the latest os Sonoma+ it is possible to get complex numbers working

https://github.com/pytorch/pytorch/issues/78044#issuecomment-1668435831