nomadkaraoke / python-audio-separator

Easy to use stem (e.g. instrumental/vocals) separation from CLI or as a python package, using a variety of amazing pre-trained models (primarily from UVR)
MIT License
444 stars 78 forks source link

Supporting VR Architecture models like HP5 #37

Closed KevinWang676 closed 8 months ago

KevinWang676 commented 8 months ago

Hi, I wonder if you are going to support VR Architecture models like HP5 as well. Thanks!

zhzhongshi commented 8 months ago


beveradb commented 8 months ago

Thanks @zhzhongshi - yep, I've literally been working on this all week and released audio-separator verison 0.14 earlier today! 😅

Please give it a try and see if it works for you!

If you confirm it works, I'll close this issue 🙏

That said - I'm still working on documentation, tests and some packaging issues (conda build failed, sigh) but the package on PyPI should "just work".

FYI, there's a new CLI parameter audio-separator --list_models which just prints all the models which are supported out of the box!

zhzhongshi commented 8 months ago

Thanks @zhzhongshi - yep, I've literally been working on this all week and released audio-separator verison 0.14 earlier today! 😅

Please give it a try and see if it works for you!

If you confirm it works, I'll close this issue 🙏

That said - I'm still working on documentation, tests and some packaging issues (conda build failed, sigh) but the package on PyPI should "just work".

FYI, there's a new CLI parameter audio-separator --list_models which just prints all the models which are supported out of the box!

I tried the steps below, but it doesn't seem to work. python -m venv .venv log:

(.venv) C:\project\test\ds\uvr>pip install audio-separator[gpu] -U
Looking in indexes:
Collecting audio-separator[gpu]
  Using cached (81 kB)
Collecting tqdm
  Using cached (78 kB)
Collecting torch
  Downloading (198.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 198.6/198.6 MB 1.5 MB/s eta 0:00:00
Collecting onnx2torch>=1.5
  Using cached (78 kB)
Collecting six>=1.16
  Using cached (11 kB)
Collecting requests>=2
  Using cached (62 kB)
Collecting librosa>=0.9
  Using cached (253 kB)
Collecting numpy>=1.23
  Downloading (15.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 15.8/15.8 MB 1.5 MB/s eta 0:00:00
Collecting pydub>=0.25
  Using cached (32 kB)
Collecting onnx>=1.14
  Using cached (14.3 MB)
Collecting onnxruntime-gpu
  Using cached (148.6 MB)
Collecting joblib>=0.14
  Downloading (302 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 302.2/302.2 kB 3.7 MB/s eta 0:00:00
Collecting soundfile>=0.12.1
  Using cached (1.0 MB)
Collecting soxr>=0.3.2
  Using cached (184 kB)
Collecting scipy>=1.2.0
  Using cached (46.2 MB)
Collecting scikit-learn>=0.20.0
  Using cached (10.6 MB)
Collecting msgpack>=1.0
  Using cached (222 kB)
Collecting numba>=0.51.0
  Using cached (2.7 MB)
Collecting pooch>=1.0
  Using cached (62 kB)
Collecting decorator>=4.3.0
  Using cached (9.1 kB)
Collecting typing-extensions>=4.1.1
  Downloading (32 kB)
Collecting lazy-loader>=0.1
  Using cached (9.1 kB)
Collecting audioread>=2.1.9
  Using cached (23 kB)
Collecting protobuf>=3.20.2
  Using cached (413 kB)
Collecting torchvision>=0.9.0
  Downloading (1.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 79.5 kB/s eta 0:00:00
Collecting certifi>=2017.4.17
  Using cached (163 kB)
Collecting urllib3<3,>=1.21.1
  Using cached (120 kB)
Collecting charset-normalizer<4,>=2
  Downloading (100 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.3/100.3 kB 62.7 kB/s eta 0:00:00
Collecting idna<4,>=2.5
  Using cached (61 kB)
Collecting jinja2
  Using cached (133 kB)
Collecting sympy
  Using cached (5.7 MB)
Collecting filelock
  Using cached (11 kB)
Collecting fsspec
  Using cached (170 kB)
Collecting networkx
  Using cached (1.6 MB)
Collecting coloredlogs
  Using cached (46 kB)
Collecting flatbuffers
  Using cached (26 kB)
Collecting packaging
  Downloading (53 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 53.0/53.0 kB 53.7 kB/s eta 0:00:00
Collecting colorama
  Using cached (25 kB)
Collecting llvmlite<0.43,>=0.42.0dev0
  Using cached (28.1 MB)
Collecting platformdirs>=2.5.0
  Using cached (17 kB)
Collecting threadpoolctl>=2.0.0
  Downloading (15 kB)
Collecting cffi>=1.0
  Using cached (181 kB)
Collecting pillow!=8.3.*,>=5.3.0
  Downloading (2.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.6/2.6 MB 154.2 kB/s eta 0:00:00
Collecting humanfriendly>=9.1
  Using cached (86 kB)
Collecting MarkupSafe>=2.0
  Using cached (17 kB)
Collecting mpmath>=0.19
  Using cached (536 kB)
Collecting pycparser
  Downloading (118 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 118.7/118.7 kB 105.2 kB/s eta 0:00:00
Collecting pyreadline3
  Using cached (95 kB)
Installing collected packages: pyreadline3, pydub, mpmath, flatbuffers, urllib3, typing-extensions, threadpoolctl, sympy, six, pycparser, protobuf, platformdirs, pillow, packaging, numpy, networkx, msgpack, MarkupSafe, llvmlite, lazy-loader, joblib, idna, humanfriendly, fsspec, filelock, decorator, colorama, charset-normalizer, certifi, audioread, tqdm, soxr, scipy, requests, onnx, numba, jinja2, coloredlogs, cffi, torch, soundfile, scikit-learn, pooch, onnxruntime-gpu, torchvision, librosa, onnx2torch, audio-separator
Successfully installed MarkupSafe-2.1.5 audio-separator-0.14.0 audioread-3.0.1 certifi-2024.2.2 cffi-1.16.0 charset-normalizer-3.3.2 colorama-0.4.6 coloredlogs-15.0.1 decorator-5.1.1 filelock-3.13.1 flatbuffers-23.5.26 fsspec-2024.2.0 humanfriendly-10.0 idna-3.6 jinja2-3.1.3 joblib-1.3.2 lazy-loader-0.3 librosa-0.10.1 llvmlite-0.42.0 mpmath-1.3.0 msgpack-1.0.7 networkx-3.2.1 numba-0.59.0 numpy-1.26.3 onnx-1.15.0 onnx2torch-1.5.13 onnxruntime-gpu-1.17.0 packaging-23.2 pillow-10.2.0 platformdirs-4.2.0 pooch-1.8.0 protobuf-4.25.2 pycparser-2.21 pydub-0.25.1 pyreadline3-3.4.1 requests-2.31.0 scikit-learn-1.4.0 scipy-1.12.0 six-1.16.0 soundfile-0.12.1 soxr-0.3.7 sympy-1.12 threadpoolctl-3.2.0 torch-2.2.0 torchvision-0.17.0 tqdm-4.66.1 typing-extensions-4.9.0 urllib3-2.2.0

[notice] A new release of pip is available: 23.0.1 -> 23.3.2
[notice] To update, run: python.exe -m pip install --upgrade pip

It seems like there's a dependency missing.

(.venv) C:\project\test\ds\uvr>audio-separator --list_models
Traceback (most recent call last):
  File "C:\Python310\lib\", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Python310\lib\", line 86, in _run_code
    exec(code, run_globals)
  File "C:\project\test\ds\uvr\.venv\Scripts\audio-separator.exe\", line 7, in <module>
  File "C:\project\test\ds\uvr\.venv\lib\site-packages\audio_separator\utils\", line 76, in main
    from audio_separator.separator import Separator
  File "C:\project\test\ds\uvr\.venv\lib\site-packages\audio_separator\separator\", line 1, in <module>
    from .separator import Separator
  File "C:\project\test\ds\uvr\.venv\lib\site-packages\audio_separator\separator\", line 15, in <module>
    from audio_separator.separator.architectures import MDXSeparator, VRSeparator
  File "C:\project\test\ds\uvr\.venv\lib\site-packages\audio_separator\separator\architectures\", line 1, in <module>
    from .mdx_separator import MDXSeparator
  File "C:\project\test\ds\uvr\.venv\lib\site-packages\audio_separator\separator\architectures\", line 10, in <module>
    from audio_separator.separator.uvr_lib_v5 import spec_utils
  File "C:\project\test\ds\uvr\.venv\lib\site-packages\audio_separator\separator\uvr_lib_v5\", line 31, in <module>
    from pyrubberband import pyrb
ModuleNotFoundError: No module named 'pyrubberband'

install it.

(.venv) C:\project\test\ds\uvr>pip install pyrubberband
Looking in indexes:
Collecting pyrubberband
  Using cached (4.1 kB)
  Preparing metadata ( ... done
Requirement already satisfied: six in c:\project\test\ds\uvr\.venv\lib\site-packages (from pyrubberband) (1.16.0)
Collecting pysoundfile>=0.8.0
  Using cached (671 kB)
Requirement already satisfied: cffi>=0.6 in c:\project\test\ds\uvr\.venv\lib\site-packages (from pysoundfile>=0.8.0->pyrubberband) (1.16.0)
Requirement already satisfied: pycparser in c:\project\test\ds\uvr\.venv\lib\site-packages (from cffi>=0.6->pysoundfile>=0.8.0->pyrubberband) (2.21)
Installing collected packages: pysoundfile, pyrubberband
  DEPRECATION: pyrubberband is being installed using the legacy ' install' method, because it does not have a 'pyproject.toml' and the 'wheel' package is not installed. pip 23.1 will enforce this behaviour change. A possible replacement is to enable the '--use-pep517' option. Discussion can be found at
  Running install for pyrubberband ... done
Successfully installed pyrubberband-0.3.0 pysoundfile-0.9.0.post1

[notice] A new release of pip is available: 23.0.1 -> 23.3.2
[notice] To update, run: python.exe -m pip install --upgrade pip

try again

(.venv) C:\project\test\ds\uvr>audio-separator --list_models
2024-02-05 15:46:35,397 - INFO - separator - Separator version 0.14.0 instantiating with output_dir: None, output_format: WAV
2024-02-05 15:46:35,397 - DEBUG - separator - Normalization threshold set to 0.9, waveform will lowered to this max amplitude to avoid clipping.
2024-02-05 15:46:35,397 - DEBUG - separator - Denoising disabled, model will only be run once. This is twice as fast, but may result in noisier output audio.
2024-02-05 15:46:35,397 - INFO - separator - Operating System: Windows 10.0.22621
2024-02-05 15:46:35,397 - INFO - separator - System: Windows Node: DESKTOP-R5JDPUC Release: 10 Machine: AMD64 Proc: Intel64 Family 6 Model 154 Stepping 3, GenuineIntel
2024-02-05 15:46:35,397 - INFO - separator - Python Version: 3.10.11
2024-02-05 15:46:35,397 - DEBUG - separator - Python package: onnxruntime-silicon not installed
2024-02-05 15:46:35,397 - DEBUG - separator - Python package: onnxruntime not installed
2024-02-05 15:46:35,397 - INFO - separator - ONNX Runtime GPU package installed with version: 1.17.0
2024-02-05 15:46:35,397 - INFO - separator - No hardware acceleration could be configured, running in CPU mode
2024-02-05 15:46:35,397 - DEBUG - separator - Downloading file from to /tmp/audio-separator-models/download_checks.json with timeout 300s
2024-02-05 15:46:36,734 - DEBUG - separator - Model download list loaded: {'current_version': 'UVR_Patch_10_6_23_4_27', 'current_version_ocl': 'UVR_Patch_10_6_23_4_27', 'current_version_mac': 'UVR_Patch_10_6_23_4_27', 'current_version_linux': 'UVR_Patch_10_6_23_4_27', 'vr_download_list': {'VR Arch Single Model v5: 1_HP-UVR': '1_HP-UVR.pth', 'VR Arch Single Model v5: 2_HP-UVR': '2_HP-UVR.pth', 'VR Arch Single Model v5: 3_HP-Vocal-UVR': '3_HP-Vocal-UVR.pth', 'VR Arch Single Model v5: 4_HP-Vocal-UVR': '4_HP-Vocal-UVR.pth', 'VR Arch Single Model v5: 5_HP-Karaoke-UVR': '5_HP-Karaoke-UVR.pth', 'VR Arch Single Model v5: 6_HP-Karaoke-UVR': '6_HP-Karaoke-UVR.pth', 'VR Arch Single Model v5: 7_HP2-UVR': '7_HP2-UVR.pth', 'VR Arch Single Model v5: 8_HP2-UVR': '8_HP2-UVR.pth', 'VR Arch Single Model v5: 9_HP2-UVR': '9_HP2-UVR.pth', 'VR Arch Single Model v5: 10_SP-UVR-2B-32000-1': '10_SP-UVR-2B-32000-1.pth', 'VR Arch Single Model v5: 11_SP-UVR-2B-32000-2': '11_SP-UVR-2B-32000-2.pth', 'VR Arch Single Model v5: 12_SP-UVR-3B-44100': '12_SP-UVR-3B-44100.pth', 'VR Arch Single Model v5: 13_SP-UVR-4B-44100-1': '13_SP-UVR-4B-44100-1.pth', 'VR Arch Single Model v5: 14_SP-UVR-4B-44100-2': '14_SP-UVR-4B-44100-2.pth', 'VR Arch Single Model v5: 15_SP-UVR-MID-44100-1': '15_SP-UVR-MID-44100-1.pth', 'VR Arch Single Model v5: 16_SP-UVR-MID-44100-2': '16_SP-UVR-MID-44100-2.pth', 'VR Arch Single Model v5: 17_HP-Wind_Inst-UVR': '17_HP-Wind_Inst-UVR.pth', 'VR Arch Single Model v5: UVR-De-Echo-Aggressive by FoxJoy': 'UVR-De-Echo-Aggressive.pth', 'VR Arch Single Model v5: UVR-De-Echo-Normal by FoxJoy': 'UVR-De-Echo-Normal.pth', 'VR Arch Single Model v5: UVR-DeEcho-DeReverb by FoxJoy': 'UVR-DeEcho-DeReverb.pth', 'VR Arch Single Model v5: UVR-DeNoise-Lite by FoxJoy': 'UVR-DeNoise-Lite.pth', 'VR Arch Single Model v5: UVR-DeNoise by FoxJoy': 'UVR-DeNoise.pth', 'VR Arch Single Model v5: UVR-BVE-4B_SN-44100-1': 'UVR-BVE-4B_SN-44100-1.pth', 'VR Arch Single Model v4: MGM_HIGHEND_v4': 'MGM_HIGHEND_v4.pth', 'VR Arch Single Model v4: MGM_LOWEND_A_v4': 'MGM_LOWEND_A_v4.pth', 'VR Arch Single Model v4: MGM_LOWEND_B_v4': 'MGM_LOWEND_B_v4.pth', 'VR Arch Single Model v4: MGM_MAIN_v4': 'MGM_MAIN_v4.pth'}, 'mdx_download_list': {'MDX-Net Model: UVR-MDX-NET Inst HQ 1': 'UVR-MDX-NET-Inst_HQ_1.onnx', 'MDX-Net Model: UVR-MDX-NET Inst HQ 2': 'UVR-MDX-NET-Inst_HQ_2.onnx', 'MDX-Net Model: UVR-MDX-NET Inst HQ 3': 'UVR-MDX-NET-Inst_HQ_3.onnx', 'MDX-Net Model: UVR-MDX-NET Main': 'UVR_MDXNET_Main.onnx', 'MDX-Net Model: UVR-MDX-NET Inst Main': 'UVR-MDX-NET-Inst_Main.onnx', 'MDX-Net Model: UVR-MDX-NET 1': 'UVR_MDXNET_1_9703.onnx', 'MDX-Net Model: UVR-MDX-NET 2': 'UVR_MDXNET_2_9682.onnx', 'MDX-Net Model: UVR-MDX-NET 3': 'UVR_MDXNET_3_9662.onnx', 'MDX-Net Model: UVR-MDX-NET Inst 1': 'UVR-MDX-NET-Inst_1.onnx', 'MDX-Net Model: UVR-MDX-NET Inst 2': 'UVR-MDX-NET-Inst_2.onnx', 'MDX-Net Model: UVR-MDX-NET Inst 3': 'UVR-MDX-NET-Inst_3.onnx', 'MDX-Net Model: UVR-MDX-NET Karaoke': 'UVR_MDXNET_KARA.onnx', 'MDX-Net Model: UVR-MDX-NET Karaoke 2': 'UVR_MDXNET_KARA_2.onnx', 'MDX-Net Model: UVR_MDXNET_9482': 'UVR_MDXNET_9482.onnx', 'MDX-Net Model: UVR-MDX-NET Voc FT': 'UVR-MDX-NET-Voc_FT.onnx', 'MDX-Net Model: Kim Vocal 1': 'Kim_Vocal_1.onnx', 'MDX-Net Model: Kim Vocal 2': 'Kim_Vocal_2.onnx', 'MDX-Net Model: Kim Inst': 'Kim_Inst.onnx', 'MDX-Net Model: Reverb HQ By FoxJoy': 'Reverb_HQ_By_FoxJoy.onnx', 'MDX-Net Model: kuielab_a_vocals': 'kuielab_a_vocals.onnx', 'MDX-Net Model: kuielab_a_other': 'kuielab_a_other.onnx', 'MDX-Net Model: kuielab_a_bass': 'kuielab_a_bass.onnx', 'MDX-Net Model: kuielab_a_drums': 'kuielab_a_drums.onnx', 'MDX-Net Model: kuielab_b_vocals': 'kuielab_b_vocals.onnx', 'MDX-Net Model: kuielab_b_other': 'kuielab_b_other.onnx', 'MDX-Net Model: kuielab_b_bass': 'kuielab_b_bass.onnx', 'MDX-Net Model: kuielab_b_drums': 'kuielab_b_drums.onnx'}, 'demucs_download_list': {'Demucs v4: htdemucs_ft': {'': '', '': '', '': '', '': '', 'htdemucs_ft.yaml': ''}, 'Demucs v4: htdemucs': {'': '', 'htdemucs.yaml': ''}, 'Demucs v4: hdemucs_mmi': {'': '', 'hdemucs_mmi.yaml': ''}, 'Demucs v4: htdemucs_6s': {'': '', 'htdemucs_6s.yaml': ''}, 'Demucs v3: mdx': {'': '', '': '', '': '', '': '', 'mdx.yaml': ''}, 'Demucs v3: mdx_q': {'': '', '': '', '': '', '': '', 'mdx_q.yaml': ''}, 'Demucs v3: mdx_extra': {'': '', '': '', '': '', '': '', 'mdx_extra.yaml': ''}, 'Demucs v3: mdx_extra_q': {'': '', '': '', '': '', '': '', 'mdx_extra_q.yaml': ''}, 'Demucs v3: UVR Model': {'': '', 'UVR_Demucs_Model_1.yaml': ''}, 'Demucs v3: repro_mdx_a': {'': '', '': '', '': '', '': '', 'repro_mdx_a.yaml': ''}, 'Demucs v3: repro_mdx_a_time_only': {'': '', '': '', 'repro_mdx_a_time_only.yaml': ''}, 'Demucs v3: repro_mdx_a_hybrid_only': {'': '', '': '', 'repro_mdx_a_hybrid_only.yaml': ''}, 'Demucs v2: demucs': {'': ''}, 'Demucs v2: demucs_extra': {'': ''}, 'Demucs v2: demucs48_hq': {'': ''}, 'Demucs v2: tasnet': {'': ''}, 'Demucs v2: tasnet_extra': {'': ''}, 'Demucs v2: demucs_unittest': {'': ''}, 'Demucs v1: demucs': {'': ''}, 'Demucs v1: demucs_extra': {'': ''}, 'Demucs v1: light': {'': ''}, 'Demucs v1: light_extra': {'': ''}, 'Demucs v1: tasnet': {'': ''}, 'Demucs v1: tasnet_extra': {'': ''}}, 'mdx_download_vip_list': {'MDX-Net Model VIP: UVR-MDX-NET_Main_340': 'UVR-MDX-NET_Main_340.onnx', 'MDX-Net Model VIP: UVR-MDX-NET_Main_390': 'UVR-MDX-NET_Main_390.onnx', 'MDX-Net Model VIP: UVR-MDX-NET_Main_406': 'UVR-MDX-NET_Main_406.onnx', 'MDX-Net Model VIP: UVR-MDX-NET_Main_427': 'UVR-MDX-NET_Main_427.onnx', 'MDX-Net Model VIP: UVR-MDX-NET_Main_438': 'UVR-MDX-NET_Main_438.onnx', 'MDX-Net Model VIP: UVR-MDX-NET_Inst_82_beta': 'UVR-MDX-NET_Inst_82_beta.onnx', 'MDX-Net Model VIP: UVR-MDX-NET_Inst_90_beta': 'UVR-MDX-NET_Inst_90_beta.onnx', 'MDX-Net Model VIP: UVR-MDX-NET_Inst_187_beta': 'UVR-MDX-NET_Inst_187_beta.onnx', 'MDX-Net Model VIP: UVR-MDX-NET-Inst_full_292': 'UVR-MDX-NET-Inst_full_292.onnx'}, 'mdx23_download_list': {'MDX23C Model: MDX23C_D1581': {'MDX23C_D1581.ckpt': 'model_2_stem_061321.yaml'}}, 'mdx23c_download_list': {'MDX23C Model: MDX23C-InstVoc HQ': {'MDX23C-8KFFT-InstVoc_HQ.ckpt': 'model_2_stem_full_band_8k.yaml'}}, 'mdx23c_download_vip_list': {'MDX23C Model VIP: MDX23C_D1581': {'MDX23C_D1581.ckpt': 'model_2_stem_061321.yaml'}, 'MDX23C Model VIP: MDX23C-InstVoc HQ 2': {'MDX23C-8KFFT-InstVoc_HQ_2.ckpt': 'model_2_stem_full_band_8k.yaml'}}, 'vr_download_vip_list': [], 'demucs_download_vip_list': []}
    "MDX": {
        "MDX-Net Model: Kim Inst": "Kim_Inst.onnx",
        "MDX-Net Model: Kim Vocal 1": "Kim_Vocal_1.onnx",
        "MDX-Net Model: Kim Vocal 2": "Kim_Vocal_2.onnx",
        "MDX-Net Model: Reverb HQ By FoxJoy": "Reverb_HQ_By_FoxJoy.onnx",
        "MDX-Net Model: UVR-MDX-NET 1": "UVR_MDXNET_1_9703.onnx",
        "MDX-Net Model: UVR-MDX-NET 2": "UVR_MDXNET_2_9682.onnx",
        "MDX-Net Model: UVR-MDX-NET 3": "UVR_MDXNET_3_9662.onnx",
        "MDX-Net Model: UVR-MDX-NET Inst 1": "UVR-MDX-NET-Inst_1.onnx",
        "MDX-Net Model: UVR-MDX-NET Inst 2": "UVR-MDX-NET-Inst_2.onnx",
        "MDX-Net Model: UVR-MDX-NET Inst 3": "UVR-MDX-NET-Inst_3.onnx",
        "MDX-Net Model: UVR-MDX-NET Inst HQ 1": "UVR-MDX-NET-Inst_HQ_1.onnx",
        "MDX-Net Model: UVR-MDX-NET Inst HQ 2": "UVR-MDX-NET-Inst_HQ_2.onnx",
        "MDX-Net Model: UVR-MDX-NET Inst HQ 3": "UVR-MDX-NET-Inst_HQ_3.onnx",
        "MDX-Net Model: UVR-MDX-NET Inst Main": "UVR-MDX-NET-Inst_Main.onnx",
        "MDX-Net Model: UVR-MDX-NET Karaoke": "UVR_MDXNET_KARA.onnx",
        "MDX-Net Model: UVR-MDX-NET Karaoke 2": "UVR_MDXNET_KARA_2.onnx",
        "MDX-Net Model: UVR-MDX-NET Main": "UVR_MDXNET_Main.onnx",
        "MDX-Net Model: UVR-MDX-NET Voc FT": "UVR-MDX-NET-Voc_FT.onnx",
        "MDX-Net Model: UVR_MDXNET_9482": "UVR_MDXNET_9482.onnx",
        "MDX-Net Model: kuielab_a_bass": "kuielab_a_bass.onnx",
        "MDX-Net Model: kuielab_a_drums": "kuielab_a_drums.onnx",
        "MDX-Net Model: kuielab_a_other": "kuielab_a_other.onnx",
        "MDX-Net Model: kuielab_a_vocals": "kuielab_a_vocals.onnx",
        "MDX-Net Model: kuielab_b_bass": "kuielab_b_bass.onnx",
        "MDX-Net Model: kuielab_b_drums": "kuielab_b_drums.onnx",
        "MDX-Net Model: kuielab_b_other": "kuielab_b_other.onnx",
        "MDX-Net Model: kuielab_b_vocals": "kuielab_b_vocals.onnx"
    "VR": {
        "VR Arch Single Model v4: MGM_HIGHEND_v4": "MGM_HIGHEND_v4.pth",
        "VR Arch Single Model v4: MGM_LOWEND_A_v4": "MGM_LOWEND_A_v4.pth",
        "VR Arch Single Model v4: MGM_LOWEND_B_v4": "MGM_LOWEND_B_v4.pth",
        "VR Arch Single Model v4: MGM_MAIN_v4": "MGM_MAIN_v4.pth",
        "VR Arch Single Model v5: 10_SP-UVR-2B-32000-1": "10_SP-UVR-2B-32000-1.pth",
        "VR Arch Single Model v5: 11_SP-UVR-2B-32000-2": "11_SP-UVR-2B-32000-2.pth",
        "VR Arch Single Model v5: 12_SP-UVR-3B-44100": "12_SP-UVR-3B-44100.pth",
        "VR Arch Single Model v5: 13_SP-UVR-4B-44100-1": "13_SP-UVR-4B-44100-1.pth",
        "VR Arch Single Model v5: 14_SP-UVR-4B-44100-2": "14_SP-UVR-4B-44100-2.pth",
        "VR Arch Single Model v5: 15_SP-UVR-MID-44100-1": "15_SP-UVR-MID-44100-1.pth",
        "VR Arch Single Model v5: 16_SP-UVR-MID-44100-2": "16_SP-UVR-MID-44100-2.pth",
        "VR Arch Single Model v5: 17_HP-Wind_Inst-UVR": "17_HP-Wind_Inst-UVR.pth",
        "VR Arch Single Model v5: 1_HP-UVR": "1_HP-UVR.pth",
        "VR Arch Single Model v5: 2_HP-UVR": "2_HP-UVR.pth",
        "VR Arch Single Model v5: 3_HP-Vocal-UVR": "3_HP-Vocal-UVR.pth",
        "VR Arch Single Model v5: 4_HP-Vocal-UVR": "4_HP-Vocal-UVR.pth",
        "VR Arch Single Model v5: 5_HP-Karaoke-UVR": "5_HP-Karaoke-UVR.pth",
        "VR Arch Single Model v5: 6_HP-Karaoke-UVR": "6_HP-Karaoke-UVR.pth",
        "VR Arch Single Model v5: 7_HP2-UVR": "7_HP2-UVR.pth",
        "VR Arch Single Model v5: 8_HP2-UVR": "8_HP2-UVR.pth",
        "VR Arch Single Model v5: 9_HP2-UVR": "9_HP2-UVR.pth",
        "VR Arch Single Model v5: UVR-BVE-4B_SN-44100-1": "UVR-BVE-4B_SN-44100-1.pth",
        "VR Arch Single Model v5: UVR-De-Echo-Aggressive by FoxJoy": "UVR-De-Echo-Aggressive.pth",
        "VR Arch Single Model v5: UVR-De-Echo-Normal by FoxJoy": "UVR-De-Echo-Normal.pth",
        "VR Arch Single Model v5: UVR-DeEcho-DeReverb by FoxJoy": "UVR-DeEcho-DeReverb.pth",
        "VR Arch Single Model v5: UVR-DeNoise by FoxJoy": "UVR-DeNoise.pth",
        "VR Arch Single Model v5: UVR-DeNoise-Lite by FoxJoy": "UVR-DeNoise-Lite.pth"

list works,start.

(.venv) C:\project\test\ds\uvr>audio-separator --help              
usage: audio-separator [-h] [-v] [--log_level LOG_LEVEL] [--list_models] [--model_filename MODEL_FILENAME] [--model_file_dir MODEL_FILE_DIR] [--output_dir OUTPUT_DIR] [--output_format OUTPUT_FORMAT]
                       [--denoise DENOISE] [--normalization_threshold NORMALIZATION_THRESHOLD] [--single_stem SINGLE_STEM] [--invert_spect INVERT_SPECT] [--sample_rate SAMPLE_RATE]
                       [--mdx_hop_length MDX_HOP_LENGTH] [--mdx_segment_size MDX_SEGMENT_SIZE] [--mdx_overlap MDX_OVERLAP] [--mdx_batch_size MDX_BATCH_SIZE] [--vr_batch_size VR_BATCH_SIZE]
                       [--vr_window_size VR_WINDOW_SIZE] [--vr_aggression VR_AGGRESSION] [--vr_enable_tta VR_ENABLE_TTA] [--vr_enable_post_process VR_ENABLE_POST_PROCESS]
                       [--vr_post_process_threshold VR_POST_PROCESS_THRESHOLD] [--vr_high_end_process VR_HIGH_END_PROCESS]

Separate audio file into different stems.

positional arguments:
  audio_file                                 The audio file path to separate, in any common format.

  -h, --help                                 show this help message and exit
  -v, --version                              show program's version number and exit
  --log_level LOG_LEVEL                      Optional: logging level, e.g. info, debug, warning (default: info). Example: --log_level=debug
  --list_models                              List all supported models and exit.
  --model_filename MODEL_FILENAME            Optional: model filename to be used for separation (default: 2_HP-UVR.pth). Example: --model_filename=UVR_MDXNET_KARA_2.onnx
  --model_file_dir MODEL_FILE_DIR            Optional: model files directory (default: /tmp/audio-separator-models/). Example: --model_file_dir=/app/models
  --output_dir OUTPUT_DIR                    Optional: directory to write output files (default: <current dir>). Example: --output_dir=/app/separated
  --output_format OUTPUT_FORMAT              Optional: output format for separated files, any common format (default: FLAC). Example: --output_format=MP3
  --denoise DENOISE                          Optional: enable or disable denoising during separation (default: False). Example: --denoise=True
  --normalization_threshold NORMALIZATION_THRESHOLD
                                             Optional: max peak amplitude to normalize input and output audio to (default: 0.9). Example: --normalization_threshold=0.7
  --single_stem SINGLE_STEM                  Optional: output only single stem, either instrumental or vocals. Example: --single_stem=instrumental
  --invert_spect INVERT_SPECT                Optional: invert secondary stem using spectogram (default: False). Example: --invert_spect=True
  --sample_rate SAMPLE_RATE                  Optional: sample_rate (default: 44100). Example: --sample_rate=44100
  --mdx_hop_length MDX_HOP_LENGTH            Optional: mdx_hop_length (default: 1024). Example: --mdx_hop_length=1024
  --mdx_segment_size MDX_SEGMENT_SIZE        Optional: mdx_segment_size (default: 256). Example: --mdx_segment_size=256
  --mdx_overlap MDX_OVERLAP                  Optional: mdx_overlap (default: 0.25). Example: --mdx_overlap=0.25
  --mdx_batch_size MDX_BATCH_SIZE            Optional: mdx_batch_size (default: 1). Example: --mdx_batch_size=4
  --vr_batch_size VR_BATCH_SIZE              Optional: vr_batch_size (default: 4). Example: --vr_batch_size=16
  --vr_window_size VR_WINDOW_SIZE            Optional: vr_window_size (default: 512). Example: --vr_window_size=256
  --vr_aggression VR_AGGRESSION              Optional: vr_aggression (default: 5). Example: --vr_aggression=2
  --vr_enable_tta VR_ENABLE_TTA              Optional: vr_enable_tta (default: False). Example: --vr_enable_tta=True
  --vr_enable_post_process VR_ENABLE_POST_PROCESS
                                             Optional: vr_enable_post_process (default: False). Example: --vr_enable_post_process=True
  --vr_post_process_threshold VR_POST_PROCESS_THRESHOLD
                                             Optional: vr_post_process_threshold (default: 0.2). Example: --vr_post_process_threshold=0.1
  --vr_high_end_process VR_HIGH_END_PROCESS  Optional: vr_high_end_process (default: False). Example: --vr_high_end_process=True

(.venv) C:\project\test\ds\uvr>audio-separator --log_level debug --model_filename UVR_MDXNET_Main.onnx --model_file_dir models/ 0822.wav
2024-02-05 15:48:25.514 - INFO - cli - Separator version 0.14.0 beginning with input file: 0822.wav
2024-02-05 15:48:29.682 - INFO - separator - Separator version 0.14.0 instantiating with output_dir: None, output_format: FLAC
2024-02-05 15:48:29.682 - DEBUG - separator - Normalization threshold set to 0.9, waveform will lowered to this max amplitude to avoid clipping.
2024-02-05 15:48:29.682 - DEBUG - separator - Denoising disabled, model will only be run once. This is twice as fast, but may result in noisier output audio.
2024-02-05 15:48:29.683 - INFO - separator - Operating System: Windows 10.0.22621
2024-02-05 15:48:29.683 - INFO - separator - System: Windows Node: DESKTOP-R5JDPUC Release: 10 Machine: AMD64 Proc: Intel64 Family 6 Model 154 Stepping 3, GenuineIntel
2024-02-05 15:48:29.683 - INFO - separator - Python Version: 3.10.11
2024-02-05 15:48:29.684 - DEBUG - separator - Python package: onnxruntime-silicon not installed
2024-02-05 15:48:29.684 - DEBUG - separator - Python package: onnxruntime not installed
2024-02-05 15:48:29.684 - INFO - separator - ONNX Runtime GPU package installed with version: 1.17.0
2024-02-05 15:48:29.684 - INFO - separator - No hardware acceleration could be configured, running in CPU mode
2024-02-05 15:48:29.684 - INFO - separator - Loading model UVR_MDXNET_Main.onnx...
2024-02-05 15:48:29.684 - DEBUG - separator - Model path set to models/UVR_MDXNET_Main.onnx
2024-02-05 15:48:29.684 - DEBUG - separator - Model not found at path models/UVR_MDXNET_Main.onnx, downloading...
2024-02-05 15:48:29.688 - DEBUG - separator - Downloading file from to models/UVR_MDXNET_Main.onnx with timeout 300s
2024-02-05 15:48:55.241 - DEBUG - separator - Calculating MD5 hash for model file to identify model parameters from UVR data...
2024-02-05 15:48:55.241 - ERROR - separator - Attempting to calculate hash of model file models/UVR_MDXNET_Main.onnx
2024-02-05 15:48:55.297 - DEBUG - separator - Model models/UVR_MDXNET_Main.onnx has hash 53c4baf4d12c3e6c3831bb8f5b532b93
2024-02-05 15:48:55.298 - DEBUG - separator - VR model data path set to models/vr_model_data.json
2024-02-05 15:48:55.299 - DEBUG - separator - VR model data not found at path models/vr_model_data.json, downloading...
2024-02-05 15:48:55.299 - DEBUG - separator - Downloading file from to models/vr_model_data.json with timeout 300s
2024-02-05 15:48:56.370 - DEBUG - separator - MDX model data path set to models/mdx_model_data.json
2024-02-05 15:48:56.373 - DEBUG - separator - MDX model data not found at path models/mdx_model_data.json, downloading...
2024-02-05 15:48:56.373 - DEBUG - separator - Downloading file from to models/mdx_model_data.json with timeout 300s
2024-02-05 15:48:57.362 - DEBUG - separator - Loading MDX and VR model parameters from UVR model data files...
2024-02-05 15:48:57.396 - DEBUG - separator - Model data loaded: {'compensate': 1.043, 'mdx_dim_f_set': 3072, 'mdx_dim_t_set': 8, 'mdx_n_fft_scale_set': 7680, 'primary_stem': 'Vocals'}
2024-02-05 15:48:57.396 - DEBUG - common_separator - Common params: model_name=UVR_MDXNET_Main, model_path=models/UVR_MDXNET_Main.onnx
2024-02-05 15:48:57.396 - DEBUG - common_separator - Common params: primary_stem_output_path=None, secondary_stem_output_path=None
2024-02-05 15:48:57.396 - DEBUG - common_separator - Common params: output_dir=None, output_format=FLAC
2024-02-05 15:48:57.396 - DEBUG - common_separator - Common params: normalization_threshold=0.9
2024-02-05 15:48:57.396 - DEBUG - common_separator - Common params: enable_denoise=False, output_single_stem=None
2024-02-05 15:48:57.396 - DEBUG - common_separator - Common params: invert_using_spec=False, sample_rate=44100
2024-02-05 15:48:57.396 - DEBUG - common_separator - Common params: primary_stem_name=Vocals, secondary_stem_name=Instrumental
2024-02-05 15:48:57.396 - DEBUG - common_separator - Common params: is_karaoke=False, is_bv_model=False, bv_model_rebalance=0
2024-02-05 15:48:57.396 - DEBUG - mdx_separator - Model params: primary_stem=Vocals, secondary_stem=Instrumental
2024-02-05 15:48:57.396 - DEBUG - mdx_separator - Model params: batch_size=1, compensate=1.043, segment_size=256, dim_f=3072, dim_t=256
2024-02-05 15:48:57.396 - DEBUG - mdx_separator - Model params: n_fft=7680, hop=1024
2024-02-05 15:48:57.396 - DEBUG - mdx_separator - Loading ONNX model for inference...
2024-02-05 15:48:57.656 - DEBUG - mdx_separator - Model loaded successfully using ONNXruntime inferencing session.
2024-02-05 15:48:57.656 - DEBUG - separator - Loading model completed.
2024-02-05 15:48:57.656 - INFO - separator - Load model duration: 00:00:27
2024-02-05 15:48:57.656 - INFO - separator - Starting separation process for audio_file_path: 0822.wav
2024-02-05 15:48:57.656 - DEBUG - mdx_separator - Preparing mix...
2024-02-05 15:48:57.656 - DEBUG - mdx_separator - Loading audio from file: 0822.wav
Traceback (most recent call last):
  File "C:\project\test\ds\uvr\.venv\lib\site-packages\librosa\core\", line 175, in load
    y, sr_native = __soundfile_load(path, offset, duration, dtype)
  File "C:\project\test\ds\uvr\.venv\lib\site-packages\librosa\core\", line 208, in __soundfile_load
    context = sf.SoundFile(path)
  File "C:\project\test\ds\uvr\.venv\lib\site-packages\", line 740, in __init__
    self._file = self._open(file, mode_int, closefd)
  File "C:\project\test\ds\uvr\.venv\lib\site-packages\", line 1264, in _open
  File "C:\project\test\ds\uvr\.venv\lib\site-packages\", line 1455, in _error_check
    raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace'))
RuntimeError: Error opening '0822.wav': Error in WAV file. No 'data' chunk marker.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Python310\lib\", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Python310\lib\", line 86, in _run_code
    exec(code, run_globals)
  File "C:\project\test\ds\uvr\.venv\Scripts\audio-separator.exe\", line 7, in <module>
  File "C:\project\test\ds\uvr\.venv\lib\site-packages\audio_separator\utils\", line 116, in main
    output_files = separator.separate(args.audio_file)
  File "C:\project\test\ds\uvr\.venv\lib\site-packages\audio_separator\separator\", line 540, in separate
    output_files = self.model_instance.separate(audio_file_path)
  File "C:\project\test\ds\uvr\.venv\lib\site-packages\audio_separator\separator\architectures\", line 90, in separate
    mix = self.prepare_mix(self.audio_file_path)
  File "C:\project\test\ds\uvr\.venv\lib\site-packages\audio_separator\separator\architectures\", line 401, in prepare_mix
    mix, sr = librosa.load(mix, mono=False, sr=self.sample_rate)
  File "C:\project\test\ds\uvr\.venv\lib\site-packages\librosa\core\", line 177, in load
    except sf.SoundFileRuntimeError as exc:
AttributeError: module 'soundfile' has no attribute 'SoundFileRuntimeError'

(.venv) C:\project\test\ds\uvr>
zhzhongshi commented 8 months ago
2024-02-05 21:04:39,354 - INFO - separator - Separator version 0.14.0 instantiating with output_dir: output/, output_format: WAV
2024-02-05 21:04:39,355 - INFO - separator - Operating System: Windows 10.0.22621
2024-02-05 21:04:39,355 - INFO - separator - System: Windows Node: DESKTOP-R5JDPUC Release: 10 Machine: AMD64 Proc: Intel64 Family 6 Model 154 Stepping 3, GenuineIntel
2024-02-05 21:04:39,355 - INFO - separator - Python Version: 3.10.11
2024-02-05 21:04:39,387 - INFO - separator - ONNX Runtime GPU package installed with version: 1.16.3
2024-02-05 21:04:39,448 - INFO - separator - CUDA is available in Torch, setting Torch device to CUDA
2024-02-05 21:04:39,448 - INFO - separator - ONNXruntime has CUDAExecutionProvider available, enabling acceleration
2024-02-05 21:04:39,448 - INFO - separator - Loading model 5_HP-Karaoke-UVR.pth...
2024-02-05 21:04:39,448 - ERROR - separator - Attempting to calculate hash of model file models/5_HP-Karaoke-UVR.pth
2024-02-05 21:04:39,503 - INFO - vr_separator - VR Separator initialisation complete
2024-02-05 21:04:39,503 - INFO - separator - Load model duration: 00:00:00
2024-02-05 21:04:39,503 - INFO - separator - Starting separation process for audio_file_path: 0822.wav
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:04<00:00,  1.01s/it]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 116/116 [00:00<?, ?it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 116/116 [00:25<00:00,  4.53it/s]
2024-02-05 21:05:19,646 - INFO - vr_separator - Saving Instrumental stem...
Traceback (most recent call last):
  File "C:\project\test\ds\uvr\", line 20, in <module>
    primary_stem_path, secondary_stem_path = separator.separate('0822.wav')
  File "C:\Python310\lib\site-packages\audio_separator\separator\", line 540, in separate
    output_files = self.model_instance.separate(audio_file_path)
  File "C:\Python310\lib\site-packages\audio_separator\separator\architectures\", line 175, in separate
    self.primary_source = self.spec_to_wav(y_spec).T
  File "C:\Python310\lib\site-packages\audio_separator\separator\architectures\", line 327, in spec_to_wav
    wav = spec_utils.cmb_spectrogram_to_wave(spec, self.model_params, is_v51_model=self.is_vr_51_model)
  File "C:\Python310\lib\site-packages\audio_separator\separator\uvr_lib_v5\", line 383, in cmb_spectrogram_to_wave
    wave = librosa.resample(wave2, orig_sr=bp["sr"], target_sr=sr, res_type=wav_resolution)
  File "C:\Python310\lib\site-packages\librosa\core\", line 670, in resample
    samplerate.resample, axis=axis, arr=y, ratio=ratio, converter_type=res_type
  File "C:\Python310\lib\site-packages\lazy_loader\", line 111, in __getattr__
    raise ModuleNotFoundError(
ModuleNotFoundError: No module named 'samplerate'

This error is lazily reported, having originally occured in
  File C:\Python310\lib\site-packages\librosa\core\, line 31, in <module>

----> samplerate = lazy.load("samplerate")

C:\project\test\ds\uvr>pip install samplerate
Looking in indexes:
Collecting samplerate
  Downloading (1.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.4/1.4 MB 492.1 kB/s eta 0:00:00
Requirement already satisfied: numpy in c:\python310\lib\site-packages (from samplerate) (1.26.0)
Installing collected packages: samplerate
Successfully installed samplerate-0.2.1
zhzhongshi commented 8 months ago

by install these it works.


thank you.

beveradb commented 8 months ago

Thanks for the report - I hadn't tested on Windows, and unfortunately I still haven't managed to get cross-platform end to end tests working in CI (that's on my to-do list though!)

I've now published version 0.14.4 which removes the pyrubberband dependency and includes samplerate as a dependency (which is somehow required by librosa on windows but not mac/linux...)

All currently-supported model types now work out of the box on Windows after pip install audio-separator[cpu]:
