nomadkaraoke / python-audio-separator

Easy to use vocal separation from CLI or as a python package, using a variety of amazing pre-trained models (primarily from UVR)
MIT License
391 stars 64 forks source link

Feature request: Ensemble (multi-model combination) mode #12

Open c469591 opened 1 year ago

c469591 commented 1 year ago

Hello, is it possible to add the functionality of combining multiple models, similar to UVR's Ensemble Mode? And can we specify the way of combination, like choosing Min Spec, Max Spec, Average in UVR? Thank you.

hijaek commented 12 months ago

I was looking for the same

beveradb commented 12 months ago

It's certainly possible!

I'm personally not keen on diving back into the UVR code again any time soon to figure out how those features are implemented, but PRs are very much welcome on this repo and I'd happily pair with anyone interested to help them get up to speed with it :)

Most of the core logic in this project was cherry picked from https://github.com/Anjok07/ultimatevocalremovergui/blob/master/separate.py

c469591 commented 12 months ago

Hi, I noticed that currently only the MD model is supported. Is it possible to add the VR model? The VR model for noise reduction is very useful. Thank you!

beveradb commented 11 months ago

Anyone is welcome to submit pull requests to this repo :)

c469591 commented 11 months ago

Thank you.

beveradb commented 7 months ago

Hey folks, FYI I've been working on adding support for VR models this week, and I released audio-separator version 0.14 earlier today with initial support for VR models!

Please give it a try and see if it works for you!

I'm still working on documentation, tests and some packaging issues but the package on PyPI should "just work".

There's a new CLI parameter audio-separator --list_models which just prints all the models which are supported out of the box, and the interface has changed slightly (you now specify model filename with extension too).

I will inevitably be working on "ensemble mode" and model chaining functionality later this month, as I've been contracted to add support for stem splitting (which kinda goes hand in hand with that).

That said, it's already pretty easy to use audio-separator with multiple models in a row as the output filenames are consistent so you can easily script it to process a file with one model after another!

c469591 commented 7 months ago

Hello, I am very diligent and excited that now we can also use the CLI version of the VR model. Thank you so much. I was wondering if it would be possible to add a synthesis feature in the future, similar to UVR, which can merge multiple documents processed from different models. This could greatly enhance the sound quality of the extracted files.

beveradb commented 7 months ago

Yes, I plan to implement that - hopefully later this month! 😄

c469591 commented 7 months ago

Thank you! I'd like to ask a question that's been asked many times before: does MDX now support the 23c model?

beveradb commented 7 months ago

Thank you! I'd like to ask a question that's been asked many times before: does MDX now support the 23c model?

I'm afraid not quite yet (that's still on my TODO list), but it's not far away now; I intend to implement that later this week or next!

beveradb commented 6 months ago

Hey @c469591, the latest version of audio-separator now supports MDX, VR and Demucs models. I haven't yet finished implementation of the checkpoint models (MDX23C) but I plan to add that later this week.

I'm actually not very familiar with the ensemble mode in UVR; I'll try and dig into it and understand exactly what it's doing later this week too. However, would you be able to explain what it does from your perspective, or provide any example audio files where it produces better results than a single model? Seeing great results from something motivates me to implement it!

Thank you!

c469591 commented 6 months ago

Hello, Based on my years of experience using UVR, the ensemble mode roughly works like this; it consists of several steps. The first step is to run each selected model individually. For example, if I have chosen the 23c from MDX and the 5_HP-Karaoke-UVR from VR, UVR will first run 23c to generate separate accompaniment and vocal files, then it will run 5_HP-Karaoke-UVR to produce another set of accompaniment and vocal files. Next, all the accompaniment tracks are merged into one file using a method that I'm not aware of; similarly for the vocal files. I speculate that it might use some strategy like audio phase cancellation to nullify identical sounds across multiple tracks while merging different sounds together—though this is just an unfounded guess. In the end, after merging, you get an accompaniment with harmonies because 5_HP-Karaoke-UVR includes harmonies. Additionally, since 23c processes richer instrumental details in its accompaniments, you end up with a result that's generally better than what you'd get from any single model alone. Of course, if one of the models didn't completely eliminate vocals from its track those remnants would also be included in the final mixed-down accompaniment file. That's my understanding of ensemble mode—I hope this helps you!

beveradb commented 5 months ago

Hey @c469591 , thanks for the write up above, that actually does help me understand the motivation a lot!

I haven't yet gotten around to working on Ensemble mode, but I wanted to give you a heads up that as of version 0.16.2 or higher, audio-separator does now support MDXC models and the VIP models from UVR.

What you've described does actually sound like something I'd like to be able to use myself (I value separation which retains harmonies / backing vocals for the karaoke tracks I make, so far I've mostly been using UVR_MDXNET_KARA_2.onnx on it's own), so I'm motivated to get it working so I too can have that kind of combination of 23c + 5_HP-Karaoke.

I just can't promise when I'll get around to it as my hobby time is limited!

c469591 commented 5 months ago

hi @beveradb I am glad that my sharing has been helpful to you. I look forward to seeing you complete this feature soon. Thank you for your hard work and contribution!

JackismyShephard commented 1 month ago

@beveradb are there any updates on ensemble mode?

beveradb commented 1 month ago

Afraid not @JackismyShephard ; to be honest new feature development for audio-separator is something I'm unlikely to be independently motivated to do as my hobby time is limited and I've been pretty happy with my results from audio-separator as it is already for my use case ( https://create.karaokehunt.com )

That said, I would still like to give it a try, I just need a bit of extra help / motivation. If you'd be willing to help / interested in pairing on it some time feel free to email me with a good date/time for a zoom/meet and that'll probably be the thing to get it started 🙏

JackismyShephard commented 1 month ago

@beveradb Completely understandable. The karaoke app looks interesting.

It might be interesting to work together on the ensemble mode or other features to add to this project, as I see it has a lot of potential. However, I am a bit busy with my own project (https://github.com/JackismyShephard/ultimate-rvc) as well as my day job, so not sure how much time I have left 😄

beveradb commented 1 month ago

No worries, well feel free to email me at andrew@beveridge.uk if you ever have a little free time and wanna pair on getting ensemble mode working :)