Fails to run when segment_size is set to anything except 256 in mdx_params of Separator

CarpetBook commented 6 months ago

Trying to run the library similarly to how I use UVR5; I wanted to run a model with a larger segment size, but I can't seem to get anything other than the default 256 to work at all on either of the two MDX models I've tried (Voc FT and Inst HQ 4).

Running the below code works normally:

from audio_separator.separator import Separator

sep = Separator(model_file_dir="./models", invert_using_spec=True)

# sep.load_model("UVR-MDX-NET-Inst_HQ_4.onnx")

# prim, _ = sep.separate("test.wav")

sep.load_model("UVR-MDX-NET-Voc_FT.onnx")

voc, _ = sep.separate("test.wav")

...outputs two stems as wav files in the parent directory, as expected.

But changing this line:

...
sep = Separator(model_file_dir="models", invert_using_spec=True, mdx_params={"segment_size": 1024})
...

gives the following output:

2024-03-14 15:57:19,547 - INFO - separator - Separator version 0.15.3 instantiating with output_dir: None, output_format: WAV
2024-03-14 15:57:19,547 - DEBUG - separator - Secondary step will be inverted using spectogram rather than waveform. This may improve quality, but is slightly slower.
2024-03-14 15:57:19,548 - INFO - separator - Operating System: Windows 10.0.19045
2024-03-14 15:57:19,548 - INFO - separator - System: Windows Node: realbox Release: 10 Machine: AMD64 Proc: AMD64 Family 25 Model 33 Stepping 2, AuthenticAMD
2024-03-14 15:57:19,548 - INFO - separator - Python Version: 3.11.7
2024-03-14 15:57:19,548 - INFO - separator - PyTorch Version: 2.1.2+cu121
2024-03-14 15:57:19,566 - INFO - separator - FFmpeg installed: ffmpeg version 2024-02-04-git-7375a6ca7b-essentials_build-www.gyan.dev Copyright (c) 2000-2024 the FFmpeg developers  
2024-03-14 15:57:19,568 - DEBUG - separator - Python package: onnxruntime-silicon not installed
2024-03-14 15:57:19,570 - INFO - separator - ONNX Runtime GPU package installed with version: 1.17.1
2024-03-14 15:57:19,570 - INFO - separator - ONNX Runtime CPU package installed with version: 1.16.3
2024-03-14 15:57:19,589 - INFO - separator - CUDA is available in Torch, setting Torch device to CUDA
2024-03-14 15:57:19,589 - INFO - separator - ONNXruntime has CUDAExecutionProvider available, enabling acceleration
2024-03-14 15:57:19,589 - INFO - separator - Loading model UVR-MDX-NET-Voc_FT.onnx...
2024-03-14 15:57:19,589 - DEBUG - separator - File already exists at ./models\download_checks.json, skipping download
2024-03-14 15:57:19,590 - DEBUG - separator - Model download list loaded
2024-03-14 15:57:19,590 - DEBUG - separator - Searching for model_filename UVR-MDX-NET-Voc_FT.onnx in supported_model_files_grouped
2024-03-14 15:57:19,590 - DEBUG - separator - Single file model identified: MDX-Net Model: UVR-MDX-NET Voc FT
2024-03-14 15:57:19,590 - DEBUG - separator - File already exists at ./models\UVR-MDX-NET-Voc_FT.onnx, skipping download
2024-03-14 15:57:19,591 - DEBUG - separator - Returning path for single model file: ./models\UVR-MDX-NET-Voc_FT.onnx
2024-03-14 15:57:19,591 - DEBUG - separator - Model downloaded, friendly name: MDX-Net Model: UVR-MDX-NET Voc FT
2024-03-14 15:57:19,591 - DEBUG - separator - Calculating MD5 hash for model file to identify model parameters from UVR data...
2024-03-14 15:57:19,591 - DEBUG - separator - Calculating hash of model file ./models\UVR-MDX-NET-Voc_FT.onnx
2024-03-14 15:57:19,605 - DEBUG - separator - Model ./models\UVR-MDX-NET-Voc_FT.onnx has hash 77d07b2667ddf05b9e3175941b4454a0
2024-03-14 15:57:19,606 - DEBUG - separator - VR model data path set to ./models\vr_model_data.json
2024-03-14 15:57:19,606 - DEBUG - separator - File already exists at ./models\vr_model_data.json, skipping download
2024-03-14 15:57:19,606 - DEBUG - separator - MDX model data path set to ./models\mdx_model_data.json
2024-03-14 15:57:19,606 - DEBUG - separator - File already exists at ./models\mdx_model_data.json, skipping download
2024-03-14 15:57:19,607 - DEBUG - separator - Loading MDX and VR model parameters from UVR model data files...
2024-03-14 15:57:19,607 - DEBUG - separator - Model data loaded from UVR JSON using hash 77d07b2667ddf05b9e3175941b4454a0: {'compensate': 1.021, 'mdx_dim_f_set': 3072, 'mdx_dim_t_set': 8, 'mdx_n_fft_scale_set': 7680, 'primary_stem': 'Vocals'}
2024-03-14 15:57:20,238 - DEBUG - common_separator - Common params: model_name=UVR-MDX-NET-Voc_FT, model_path=./models\UVR-MDX-NET-Voc_FT.onnx
2024-03-14 15:57:20,238 - DEBUG - common_separator - Common params: primary_stem_output_path=None, secondary_stem_output_path=None
2024-03-14 15:57:20,238 - DEBUG - common_separator - Common params: output_dir=None, output_format=WAV
2024-03-14 15:57:20,239 - DEBUG - common_separator - Common params: normalization_threshold=0.9
2024-03-14 15:57:20,239 - DEBUG - common_separator - Common params: enable_denoise=None, output_single_stem=None
2024-03-14 15:57:20,239 - DEBUG - common_separator - Common params: invert_using_spec=True, sample_rate=44100
2024-03-14 15:57:20,239 - DEBUG - common_separator - Common params: primary_stem_name=Vocals, secondary_stem_name=Instrumental
2024-03-14 15:57:20,239 - DEBUG - common_separator - Common params: is_karaoke=False, is_bv_model=False, bv_model_rebalance=0
2024-03-14 15:57:20,239 - DEBUG - mdx_separator - MDX arch params: batch_size=1, segment_size=1024
2024-03-14 15:57:20,240 - DEBUG - mdx_separator - MDX arch params: overlap=None, hop_length=None, enable_denoise=None
2024-03-14 15:57:20,240 - DEBUG - mdx_separator - MDX arch params: compensate=1.021, dim_f=3072, dim_t=256, n_fft=7680
2024-03-14 15:57:20,240 - DEBUG - mdx_separator - MDX arch params: config_yaml=None
2024-03-14 15:57:20,240 - DEBUG - mdx_separator - Loading ONNX model for inference...
Traceback (most recent call last):
  File "t:\python-audio-separator-test\Untitled-1.py", line 10, in <module>
    sep.load_model("UVR-MDX-NET-Voc_FT.onnx")
  File "C:\ProgramData\Anaconda3\envs\uvrcli\Lib\site-packages\audio_separator\separator\separator.py", line 605, in load_model
    self.model_instance = separator_class(common_config=common_params, arch_config=self.arch_specific_params[model_type])
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\Anaconda3\envs\uvrcli\Lib\site-packages\audio_separator\separator\architectures\mdx_separator.py", line 106, in __init__
    self.model_run = onnx2torch.convert(self.model_path)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\Anaconda3\envs\uvrcli\Lib\site-packages\onnx2torch\converter.py", line 72, in convert
    onnx_model = safe_shape_inference(onnx_model_or_path)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\Anaconda3\envs\uvrcli\Lib\site-packages\onnx2torch\utils\safe_shape_inference.py", line 46, in safe_shape_inference
    return _shape_inference_by_model_path(onnx_model_or_path, output_path=tmp_model_file.name, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\Anaconda3\envs\uvrcli\Lib\site-packages\onnx2torch\utils\safe_shape_inference.py", line 24, in _shape_inference_by_model_path
    return onnx.load(output_path)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\Anaconda3\envs\uvrcli\Lib\site-packages\onnx\__init__.py", line 208, in load_model
    model = _get_serializer(format, f).deserialize_proto(_load_bytes(f), ModelProto())
                                                         ^^^^^^^^^^^^^^
  File "C:\ProgramData\Anaconda3\envs\uvrcli\Lib\site-packages\onnx\__init__.py", line 145, in _load_bytes
    with open(f, "rb") as readable:
         ^^^^^^^^^^^^^
PermissionError: [Errno 13] Permission denied: 'T:\\python-audio-separator-test\\models\\tmptftn7wrq'

CarpetBook commented 6 months ago

Tried the same code on M1 Mac, but the issue doesn't reproduce.

beveradb commented 6 months ago

That's really strange, since the error message you saw in your first debug log output seems to be a permissions issue with a temporary folder, which I wouldn't expect to have anything to do with that parameter...

However, honestly I haven't tested on Windows at all, I don't have a Windows machine to hand and don't really know how to use that operating system these days so I'm not very well equipped to help. I develop on a Mac and test on Linux (in a docker container).

Are you able to run Docker on your Windows machine? If so, that's probably the easiest way to get things to "just work", as the docker container will run within a Linux VM which will likely resolve any file path and/or permission issues.

I do note that there is a (subtle, but relevant) difference between your two example code lines though - in one you're setting model_file_dir to "./models" and the other, "models". Technically those should both be relative paths to the same directory you're in, but windows file paths are weird (backslash instead of forward slash) so you might want to try experimenting with that!

If you want to try and debug it more thoroughly with support from me I'm happy to jump on a screen share call some time to investigate properly, just let me know a good date/time/timezone for you! I just don't have a Windows machine to do that myself :)

CarpetBook commented 6 months ago

Ah, sorry, that's just from me testing to make sure the paths weren't a problem. I removed all the spaces from the path, and I put the dot slash to make sure that it was getting the correct models folder, but the result was the same.

I'm sure it would work in the docker container, now seeing that it worked fine on my own Mac. I'll experiment more later and come back with any updates, if I find anything else or if I get something working. I'll close the issue for now.

helloimmatt commented 4 months ago

I have this exact same issue on Windows. It happens when the model has to be converted from onnx to torch due to segment size != dim_t.

This is a bug in onnx2torch, see fix below:

https://github.com/ENOT-AutoDL/onnx2torch/discussions/153

beveradb commented 4 months ago

Thanks for the pointer @helloimmatt !

I don't have a Windows machine to test on but I've opened this PR with what I think you're suggesting the fix is?

If you're able to reproduce the error, please could you test that branch (name: attempt-to-test-windows-mdx-fix) on your machine to see if it fixes it? 🙇

Cheers, Andrew

helloimmatt commented 4 months ago

So, I've tested it. It works completely fine. Quite clever to fix the problem in onnx2torch in your own code! :)

beveradb commented 4 months ago

Thanks for testing and confirming @helloimmatt ! FYI @CarpetBook and anyone else who sees this - this issue on Windows should be fixed from v0.16.5 onwards 🙏

nomadkaraoke / python-audio-separator

Fails to run when segment_size is set to anything except 256 in mdx_params of Separator #51