Open filtercodes opened 3 weeks ago
Hi, the most of computing time is from source separation (Demucs), so Demucs should support mps.
However, I did a bit search and I guess it's not possible for now sadly 😢 https://github.com/facebookresearch/demucs/issues/432
Hi @tae-jun, thanks for clarification. The issue seems to be that mps doesn't support complex number operations or any other float than float32... and then it's a matter of finding a right spot and using .to("cpu") function to drag back processing of that particular math operation back to cpu.
https://github.com/facebookresearch/demucs/blob/main/demucs/htdemucs.py#L628C1-L634C24
# to cpu as mps doesnt support complex numbers
# demucs issue #435 ##432
# NOTE: in this case z already is on cpu
# TODO: remove this when mps supports complex numbers
x_is_mps = x.device.type == "mps"
if x_is_mps:
x = x.cpu()
and then
https://github.com/facebookresearch/demucs/blob/main/demucs/htdemucs.py#L645
# back to mps device
if x_is_mps:
x = x.to("mps")
But then we still have cpu doing most of the work. The real solution would be to not use complex numbers at all... if the algorithm can be adopted to use only real number like for example FFT, can be done with or without complex numbers.
Are there any other source separation alternatives that we could use instead of Demucs?
There are many publicly available source separation tools nowadays, such as Spleeter.
However, I have not tested all-in-one on other source separation models, and since all-in-one is trained on outputs of Demucs, I can't guarantee its performance.
But I think it's worth a try!
I found this one with MPS support
https://github.com/karaokenerds/python-audio-separator
It seems like with the latest os Sonoma+ it is possible to get complex numbers working
https://github.com/pytorch/pytorch/issues/78044#issuecomment-1668435831
For mac users analysing audio files takes a really long time because it's all done on CPU without utilising Metal acceleration. Is there a way to provide alternative kernels for these tasks that have sliding window self-attention, or equivalent algorithm but also have kernel backend compiled for MPS?