Closed caner-cetin closed 2 months ago
Hey @caner-cetin , thanks for the kind words, glad it useful for you!
No, that isn't normal 😅
My machine is similar to yours (Macbook Pro with M3 Max)
(audio-separator) ➜ ~ neofetch
'c.
,xNMM.
.OMMMMo
OMMM0,
.;loddo:' loolloddol;.
cKMMMMMMMMMMNWMMMMMMMMMM0: andrew@AndrewBeveridgeMBPM3.local
.KMMMMMMMMMMMMMMMMMMMMMMMWd. ---------------------------------
XMMMMMMMMMMMMMMMMMMMMMMMX. OS: macOS 14.5 23F79 arm64
;MMMMMMMMMMMMMMMMMMMMMMMM: Host: Mac15,10
:MMMMMMMMMMMMMMMMMMMMMMMM: Kernel: 23.5.0
.MMMMMMMMMMMMMMMMMMMMMMMMX. Uptime: 2 days, 12 hours, 58 mins
kMMMMMMMMMMMMMMMMMMMMMMMMWd. Packages: 214 (brew)
.XMMMMMMMMMMMMMMMMMMMMMMMMMMk Shell: zsh 5.9
.XMMMMMMMMMMMMMMMMMMMMMMMMK. Resolution: 1512x982
kMMMMMMMMMMMMMMMMMMMMMMd DE: Aqua
;KMMMMMMMWXXWMMMMMMMk. WM: Quartz Compositor
.cooc,. .,coo:. WM Theme: Blue (Dark)
Terminal: iTerm2
Terminal Font: Monaco 12
CPU: Apple M3 Max
GPU: Apple M3 Max
Memory: 4810MiB / 36864MiB
I made you a short screencast video demonstrating separation of a popular song (Duration: 00:03:06
) on my machine:
https://youtu.be/ZXZwXMDe5vM
This includes showing how I verify inference is using my GPU (using the Activity Monitor GPU History graph).
On my machine, separating that 3 minute track takes the following amounts of time, depending on which model I choose:
audio-separator -d -m 2_HP-UVR.pth test.flac
: Separation duration: 00:00:19
audio-separator -d -m UVR_MDXNET_KARA_2.onnx test.flac
: Separation duration: 00:00:28
audio-separator -d -m UVR-MDX-NET-Inst_HQ_4.onnx test.flac
: Separation duration: 00:00:36
audio-separator -d -m model_bs_roformer_ep_317_sdr_12.9755.ckpt test.flac
: Separation duration: 00:01:49
audio-separator -d -m MDX23C-8KFFT-InstVoc_HQ_2.ckpt test.flac
: Separation duration: 00:02:37
So as you can see, it's definitely not normal for it to take 7 minutes for a 3 minute track!
If you want to test with the same input file and commands as me, here's the file I used in the tests above: https://www.dropbox.com/scl/fi/k4tbc79ggzfcn509qwpji/sabrina-please-test.flac?rlkey=ufnkns7vjnhuqsic225rbzdbx&dl=0
My recommendation to you would be to:
audio-separator
: 0.19.1
3.11
(3.9 is very old now and not officially supported by this project)audio-separator -l
)Good luck! -Andrew
Thanks for the quick response Andrew, I bumped the python to 3.12.0, bumped library, checked cpu and gpu history, which, it maximizes the entire gpu history during runtime
Yet it still takes 5 more minutes to process the same FLAC file provided, even in same model model_bs_roformer_ep_317_sdr_12.9755.ckpt
maybe 2022 M2 is significantly weaker than M3 Max?
That is still kinda surprising! Can you try the other models I tried to compare runtimes for the other architectures? If they're all slower by a similar amount I'd be inclined to agree with you (but still surprised)
Tried by the way, forgot to add that. They are significantly slower than your benchmarks. At least 1 minute slower on fastest, like, 10 minute slower for an entire album. I am surprised too. At least the awesome quality overall compensates the speed lol.
Researched for a few days, tried to disable Metal Processing Shader (which was horrible, iterations per second dipped to 18 second), tried to build Torch myself, but best I could get was 12 seconds per iteration. meanwhile ye olde shitbox 1650ti could do 5.50 seconds per iteration out of the box. i dont think there is too much to add, i will just assume Torch runs horrible on M2, aand will close it here. thanks for the reply again.
I dont know what have changed (maybe due to model?) but after latest main update and model mel_band_roformer_karaoke_aufr33_viperx_sdr_10
I am getting 5 seconds per iteration. Which is a huge improvement over 18 seconds. No quality lost, everything sounds crystal clear, again, I dont know what have changed, but had to come here and thank you friend
Glad to hear! 😄
First of all, thanks for this wonderful project, I cannot describe with words that how useful it is for me, and how clean it can extract the vocals, but I have a question. Is it normal that a 3 minute 1 second track takes 7 minutes to separate?
I dont know if this is related to Mac / OSX, but process takes up my entire system resources, which is a good thing that it can utilize M2 at its best. But still, is 7 minutes normal?