Closed joseph16388 closed 4 months ago
Hey @joseph16388 that's understandable, and I actually designed audio-separator
with sane defaults (and by "sane" I really mean "my own personal recommendation for clean two stem vocals/instrumental separation).
So, I'm curious, have you already tried running the audio-separator
CLI?
It is designed to "just work" by default, so once you've installed it you only need to specify your input filename and it should work ๐
Just run audio-separator input.wav
(replace input.wav
with your input audio file name)
That will run with the default settings, which currently means it will automatically download the model_mel_band_roformer_ep_3005_sdr_11.4360.ckpt
model (which I consider one of the best-performing general purpose models currently) and use that with default settings.
If you don't like the sound of the output from that model, I would recommend trying one of these models:
audio-separator -m UVR-MDX-NET-Inst_HQ_4.onnx input.wav
audio-separator -m MDX23C-8KFFT-InstVoc_HQ_2.ckpt input.wav
Just in case this wasn't clear from the Usage documentation - for all models offered by audio-separator
, you don't need to download them first.
The CLI tool / library downloads them automatically when first used, and stores them in a cache directory to avoid repeated downloads.
Hey @joseph16388 that's understandable, and I actually designed
audio-separator
with sane defaults (and by "sane" I really mean "my own personal recommendation for clean two stem vocals/instrumental separation).So, I'm curious, have you already tried running the
audio-separator
CLI? It is designed to "just work" by default, so once you've installed it you only need to specify your input filename and it should work ๐Just run
audio-separator input.wav
(replaceinput.wav
with your input audio file name)That will run with the default settings, which currently means it will automatically download the
model_mel_band_roformer_ep_3005_sdr_11.4360.ckpt
model (which I consider one of the best-performing general purpose models currently) and use that with default settings.If you don't like the sound of the output from that model, I would recommend trying one of these models:
audio-separator -m UVR-MDX-NET-Inst_HQ_4.onnx input.wav
audio-separator -m MDX23C-8KFFT-InstVoc_HQ_2.ckpt input.wav
Just in case this wasn't clear from the Usage documentation - for all models offered by
audio-separator
, you don't need to download them first. The CLI tool / library downloads them automatically when first used, and stores them in a cache directory to avoid repeated downloads.
Hi๏ผthanks for reply๏ผ
I used pip install "audio-separator[gpu]" and successfully installed all the libs. Then I typed in the terminal: audio-separator test.wav, but the program reported an error:
2024-06-17 16:39:19.175 -info-cli - Separator version 0.17.3 beginning with input file: test.wav
2024-06-17 16:39:19.177 - INFO - separator - Separator version 0.17.3 instantiating with output_dir: None, output_format: FLAC
Traceback (most recent call last):
File "
There are also some error warnings in installing the onnxruntime-gpu library in my project: EP Error EP Error C:\a_work\1\s\onnxruntime\python\onnxruntime_pybind_state. cc:456 onnxruntime::python::RegisterTensorRTPluginsAsCustomOps Please install TensorRT libraries as mentioned in the GPU requirements page, make sure they're in the PATH or LD_LIBRARY_PATH, and that your GPU is supported. when using ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider'] Falling back to ['CUDAExecutionProvider', 'CPUExecutionProvider'] and retrying.
Although not enough to disable the app, and my torch and cuda Version: 2.1.1+cu121
Ah crap, that first error is actually a regression which was introduced when PR #76 was merged in last week apparently - sorry about that, I didn't pick up on it in my manual testing and haven't gotten around to implementing a decent suite of tests in this repo yet ๐
I've just fixed it with this commit , so if you upgrade to version 0.17.4
(e.g. pip install -U "audio-separator[gpu]"
) that should work.
As for the other error, I don't really know much about that (I've never owned an nvidia graphics card myself, I work from a macbook and don't have a PC) but I'd recommend checking the ONNXRuntime requirements here, e.g. the specific versions they suggest etc. https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements
or maybe this page, since your error mentions TensorRT? I don't know what that is though ๐ https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html#requirements
Ah crap, that first error is actually a regression which was introduced when PR #76 was merged in last week apparently - sorry about that, I didn't pick up on it in my manual testing and haven't gotten around to implementing a decent suite of tests in this repo yet ๐
I've just fixed it with this commit , so if you upgrade to version
0.17.4
(e.g.pip install -U "audio-separator[gpu]"
) that should work.As for the other error, I don't really know much about that (I've never owned an nvidia graphics card myself, I work from a macbook and don't have a PC) but I'd recommend checking the ONNXRuntime requirements here, e.g. the specific versions they suggest etc. https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements
or maybe this page, since your error mentions TensorRT? I don't know what that is though ๐ https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html#requirements
Hello! It worked for me. But there is a warning: .... 2024-06-18 11:00:28.816 - INFO - separator - CUDA is available in Torch, setting Torch device to CUDA 2024-06-18 11:00:28.816 - WARNING - separator - CUDAExecutionProvider not available in ONNXruntime, so acceleration will NOT be enabled. ..... I am not sure if it has any effect on acceleration, I have tested an audio of 5 minutes and 28 seconds, using the first model you recommended, it took about 50 seconds, and the other two models took 1-2 minutes, I still feel a little slow, is it because the acceleration is not turned on as mentioned in this warning? But I looked at the GPU usage and it did increase a lot. Thank you!
Glad that got you up and running!
So, it sounds like Torch is able to use CUDA successfully but ONNXRuntime isn't - I deliberately separated the configuration of the two to try and provide a middle ground in this situation.
See here where torch is configured: https://github.com/karaokenerds/python-audio-separator/blob/main/audio_separator/separator/separator.py#L213 and here, where CUDA is enabled for ONNXRuntime if available: https://github.com/karaokenerds/python-audio-separator/blob/main/audio_separator/separator/separator.py#L240
In this situation, you will still get GPU acceleration for operations which are being handled by Torch directly, but not anything which is handled by ONNXRuntime.
In practice, that actually isn't so bad - audio-separator
currently supports 5 model architectures, and 4 of those don't actually use ONNXRuntime at all anyway!
These model architectures are pure Torch:
The only models which use ONNXRuntime are the MDX models.
So, if you want to get the most out of your GPU without fixing your ONNXRuntime setup, just use one of the pure torch models, e.g. model_mel_band_roformer_ep_3005_sdr_11.4360.ckpt
or MDX23C-8KFFT-InstVoc_HQ_2.ckpt
If you want faster runtime, try one of the VR models which perform - e.g. 2_HP-UVR.pth
which I usually find provides good enough results with much shorter runtime.
Every model has slightly different characteristics, some are better for specific types of tracks (e.g. age, genre, vocal presence, recording tech, etc.), some take much longer to run than others, etc. On top of that, what sounds good to one person may not sound good to someone else! So my main advice is to try a few different models until you get results which seem good to your ears!
Glad that got you up and running!
So, it sounds like Torch is able to use CUDA successfully but ONNXRuntime isn't - I deliberately separated the configuration of the two to try and provide a middle ground in this situation.
See here where torch is configured: https://github.com/karaokenerds/python-audio-separator/blob/main/audio_separator/separator/separator.py#L213 and here, where CUDA is enabled for ONNXRuntime if available: https://github.com/karaokenerds/python-audio-separator/blob/main/audio_separator/separator/separator.py#L240
In this situation, you will still get GPU acceleration for operations which are being handled by Torch directly, but not anything which is handled by ONNXRuntime.
In practice, that actually isn't so bad -
audio-separator
currently supports 5 model architectures, and 4 of those don't actually use ONNXRuntime at all anyway!These model architectures are pure Torch:
- Demucs
- VR
- MDXC (TFC-TDF)
- MDXC (RoFormer)
The only models which use ONNXRuntime are the MDX models.
So, if you want to get the most out of your GPU without fixing your ONNXRuntime setup, just use one of the pure torch models, e.g.
model_mel_band_roformer_ep_3005_sdr_11.4360.ckpt
orMDX23C-8KFFT-InstVoc_HQ_2.ckpt
If you want faster runtime, try one of the VR models which perform - e.g.
2_HP-UVR.pth
which I usually find provides good enough results with much shorter runtime.Every model has slightly different characteristics, some are better for specific types of tracks (e.g. age, genre, vocal presence, recording tech, etc.), some take much longer to run than others, etc. On top of that, what sounds good to one person may not sound good to someone else! So my main advice is to try a few different models until you get results which seem good to your ears!
Got it! Thank U!
I'm closing this issue as the default functionality of audio-separator
does work out of the box ๐
and there's a discussion thread about models here: https://github.com/karaokenerds/python-audio-separator/discussions/82
Hi, I just saw this repo and I don't know anything about the model or what it does. What I need now is to input an audio, as long as the voice can be separated! So please give me a recommendation, a fast, universal, best effect model and its download address can be, and the model can be used by python-audio-separator api, thank you!