nomadkaraoke / python-audio-separator

Easy to use stem (e.g. instrumental/vocals) separation from CLI or as a python package, using a variety of amazing pre-trained models (primarily from UVR)
MIT License
477 stars 82 forks source link

Recommend a model #78

Closed joseph16388 closed 4 months ago

joseph16388 commented 4 months ago

Hi, I just saw this repo and I don't know anything about the model or what it does. What I need now is to input an audio, as long as the voice can be separated! So please give me a recommendation, a fast, universal, best effect model and its download address can be, and the model can be used by python-audio-separator api, thank you!

beveradb commented 4 months ago

Hey @joseph16388 that's understandable, and I actually designed audio-separator with sane defaults (and by "sane" I really mean "my own personal recommendation for clean two stem vocals/instrumental separation).

So, I'm curious, have you already tried running the audio-separator CLI? It is designed to "just work" by default, so once you've installed it you only need to specify your input filename and it should work ๐Ÿ˜„

Just run audio-separator input.wav (replace input.wav with your input audio file name)

That will run with the default settings, which currently means it will automatically download the model_mel_band_roformer_ep_3005_sdr_11.4360.ckpt model (which I consider one of the best-performing general purpose models currently) and use that with default settings.

If you don't like the sound of the output from that model, I would recommend trying one of these models:

Just in case this wasn't clear from the Usage documentation - for all models offered by audio-separator, you don't need to download them first. The CLI tool / library downloads them automatically when first used, and stores them in a cache directory to avoid repeated downloads.

joseph16388 commented 4 months ago

Hey @joseph16388 that's understandable, and I actually designed audio-separator with sane defaults (and by "sane" I really mean "my own personal recommendation for clean two stem vocals/instrumental separation).

So, I'm curious, have you already tried running the audio-separator CLI? It is designed to "just work" by default, so once you've installed it you only need to specify your input filename and it should work ๐Ÿ˜„

Just run audio-separator input.wav (replace input.wav with your input audio file name)

That will run with the default settings, which currently means it will automatically download the model_mel_band_roformer_ep_3005_sdr_11.4360.ckpt model (which I consider one of the best-performing general purpose models currently) and use that with default settings.

If you don't like the sound of the output from that model, I would recommend trying one of these models:

  • audio-separator -m UVR-MDX-NET-Inst_HQ_4.onnx input.wav
  • audio-separator -m MDX23C-8KFFT-InstVoc_HQ_2.ckpt input.wav

Just in case this wasn't clear from the Usage documentation - for all models offered by audio-separator, you don't need to download them first. The CLI tool / library downloads them automatically when first used, and stores them in a cache directory to avoid repeated downloads.

Hi๏ผthanks for reply๏ผ I used pip install "audio-separator[gpu]" and successfully installed all the libs. Then I typed in the terminal: audio-separator test.wav, but the program reported an error: 2024-06-17 16:39:19.175 -info-cli - Separator version 0.17.3 beginning with input file: test.wav 2024-06-17 16:39:19.177 - INFO - separator - Separator version 0.17.3 instantiating with output_dir: None, output_format: FLAC Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in _run_code File "D:\XXX\venv\Scripts\audio-separator.exe__main.py", line 7, in File "D:\XXX\venv\Lib\site-packages\audio_separator\utils\cli.py", line 139, in main separator = Separator( ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ File "D:\XXX\venv\Lib\site-packages\audio_separator\separator\separator.py", line 108, in init__ os.makedirs(self.output_dir, exist_ok=True) File "", line 210, in makedirs File "", line 213, in split TypeError: expected str, bytes or os.PathLike object, not NoneType

There are also some error warnings in installing the onnxruntime-gpu library in my project: EP Error EP Error C:\a_work\1\s\onnxruntime\python\onnxruntime_pybind_state. cc:456 onnxruntime::python::RegisterTensorRTPluginsAsCustomOps Please install TensorRT libraries as mentioned in the GPU requirements page, make sure they're in the PATH or LD_LIBRARY_PATH, and that your GPU is supported. when using ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider'] Falling back to ['CUDAExecutionProvider', 'CPUExecutionProvider'] and retrying.

Although not enough to disable the app, and my torch and cuda Version: 2.1.1+cu121

beveradb commented 4 months ago

Ah crap, that first error is actually a regression which was introduced when PR #76 was merged in last week apparently - sorry about that, I didn't pick up on it in my manual testing and haven't gotten around to implementing a decent suite of tests in this repo yet ๐Ÿ˜ž

I've just fixed it with this commit , so if you upgrade to version 0.17.4 (e.g. pip install -U "audio-separator[gpu]") that should work.

As for the other error, I don't really know much about that (I've never owned an nvidia graphics card myself, I work from a macbook and don't have a PC) but I'd recommend checking the ONNXRuntime requirements here, e.g. the specific versions they suggest etc. https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements

or maybe this page, since your error mentions TensorRT? I don't know what that is though ๐Ÿ˜… https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html#requirements

joseph16388 commented 4 months ago

Ah crap, that first error is actually a regression which was introduced when PR #76 was merged in last week apparently - sorry about that, I didn't pick up on it in my manual testing and haven't gotten around to implementing a decent suite of tests in this repo yet ๐Ÿ˜ž

I've just fixed it with this commit , so if you upgrade to version 0.17.4 (e.g. pip install -U "audio-separator[gpu]") that should work.

As for the other error, I don't really know much about that (I've never owned an nvidia graphics card myself, I work from a macbook and don't have a PC) but I'd recommend checking the ONNXRuntime requirements here, e.g. the specific versions they suggest etc. https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements

or maybe this page, since your error mentions TensorRT? I don't know what that is though ๐Ÿ˜… https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html#requirements

Hello! It worked for me. But there is a warning: .... 2024-06-18 11:00:28.816 - INFO - separator - CUDA is available in Torch, setting Torch device to CUDA 2024-06-18 11:00:28.816 - WARNING - separator - CUDAExecutionProvider not available in ONNXruntime, so acceleration will NOT be enabled. ..... I am not sure if it has any effect on acceleration, I have tested an audio of 5 minutes and 28 seconds, using the first model you recommended, it took about 50 seconds, and the other two models took 1-2 minutes, I still feel a little slow, is it because the acceleration is not turned on as mentioned in this warning? But I looked at the GPU usage and it did increase a lot. Thank you!

beveradb commented 4 months ago

Glad that got you up and running!

So, it sounds like Torch is able to use CUDA successfully but ONNXRuntime isn't - I deliberately separated the configuration of the two to try and provide a middle ground in this situation.

See here where torch is configured: https://github.com/karaokenerds/python-audio-separator/blob/main/audio_separator/separator/separator.py#L213 and here, where CUDA is enabled for ONNXRuntime if available: https://github.com/karaokenerds/python-audio-separator/blob/main/audio_separator/separator/separator.py#L240

In this situation, you will still get GPU acceleration for operations which are being handled by Torch directly, but not anything which is handled by ONNXRuntime.

In practice, that actually isn't so bad - audio-separator currently supports 5 model architectures, and 4 of those don't actually use ONNXRuntime at all anyway!

These model architectures are pure Torch:

The only models which use ONNXRuntime are the MDX models.

So, if you want to get the most out of your GPU without fixing your ONNXRuntime setup, just use one of the pure torch models, e.g. model_mel_band_roformer_ep_3005_sdr_11.4360.ckpt or MDX23C-8KFFT-InstVoc_HQ_2.ckpt

If you want faster runtime, try one of the VR models which perform - e.g. 2_HP-UVR.pth which I usually find provides good enough results with much shorter runtime.

Every model has slightly different characteristics, some are better for specific types of tracks (e.g. age, genre, vocal presence, recording tech, etc.), some take much longer to run than others, etc. On top of that, what sounds good to one person may not sound good to someone else! So my main advice is to try a few different models until you get results which seem good to your ears!

joseph16388 commented 4 months ago

Glad that got you up and running!

So, it sounds like Torch is able to use CUDA successfully but ONNXRuntime isn't - I deliberately separated the configuration of the two to try and provide a middle ground in this situation.

See here where torch is configured: https://github.com/karaokenerds/python-audio-separator/blob/main/audio_separator/separator/separator.py#L213 and here, where CUDA is enabled for ONNXRuntime if available: https://github.com/karaokenerds/python-audio-separator/blob/main/audio_separator/separator/separator.py#L240

In this situation, you will still get GPU acceleration for operations which are being handled by Torch directly, but not anything which is handled by ONNXRuntime.

In practice, that actually isn't so bad - audio-separator currently supports 5 model architectures, and 4 of those don't actually use ONNXRuntime at all anyway!

These model architectures are pure Torch:

  • Demucs
  • VR
  • MDXC (TFC-TDF)
  • MDXC (RoFormer)

The only models which use ONNXRuntime are the MDX models.

So, if you want to get the most out of your GPU without fixing your ONNXRuntime setup, just use one of the pure torch models, e.g. model_mel_band_roformer_ep_3005_sdr_11.4360.ckpt or MDX23C-8KFFT-InstVoc_HQ_2.ckpt

If you want faster runtime, try one of the VR models which perform - e.g. 2_HP-UVR.pth which I usually find provides good enough results with much shorter runtime.

Every model has slightly different characteristics, some are better for specific types of tracks (e.g. age, genre, vocal presence, recording tech, etc.), some take much longer to run than others, etc. On top of that, what sounds good to one person may not sound good to someone else! So my main advice is to try a few different models until you get results which seem good to your ears!

Got it! Thank U!

beveradb commented 4 months ago

I'm closing this issue as the default functionality of audio-separator does work out of the box ๐Ÿ˜„ and there's a discussion thread about models here: https://github.com/karaokenerds/python-audio-separator/discussions/82