nomadkaraoke / python-audio-separator

Easy to use stem (e.g. instrumental/vocals) separation from CLI or as a python package, using a variety of amazing pre-trained models (primarily from UVR)
MIT License
477 stars 82 forks source link

can't get .pth and .ckpt models working #90

Closed ybwai closed 3 months ago

ybwai commented 3 months ago

The MDX23C...ckpt type models seem to give me error

    self.primary_stem_name = self.model_data["primary_stem"]
KeyError: 'primary_stem'

And all the HP_...pth uvr models seem to give me a RuntimeError: Invalid buffer size: 3.94 GB

The .onnx models are working fine.

I am running on a intel i9 2.4Ghz MacBook Pro with 32GB RAM.

I can use other cli tools to run the uvr HP_ models just fine, so not sure why this runner can't.

Any suggestions?

Here are some full logs if they help:

MDX23:

(.venv) ➜  PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 ./.venv/bin/python -m sandbox.separator
2024-07-20 10:34:07,540 - INFO - separator - Separator version 0.14.3 instantiating with output_dir: None, output_format: WAV
2024-07-20 10:34:07,540 - INFO - separator - Operating System: Darwin Darwin Kernel Version 23.5.0: Wed May  1 20:09:52 PDT 2024; root:xnu-10063.121.3~5/RELEASE_X86_64
2024-07-20 10:34:07,540 - INFO - separator - System: Darwin Node: xxxs-MacBook-Pro-2.local Release: 23.5.0 Machine: x86_64 Proc: i386
2024-07-20 10:34:07,540 - INFO - separator - Python Version: 3.10.14
2024-07-20 10:34:07,857 - INFO - separator - FFmpeg installed: ffmpeg version 7.0 Copyright (c) 2000-2024 the FFmpeg developers
2024-07-20 10:34:07,858 - DEBUG - separator - Python package: onnxruntime-gpu not installed
2024-07-20 10:34:07,858 - DEBUG - separator - Python package: onnxruntime-silicon not installed
2024-07-20 10:34:07,859 - INFO - separator - ONNX Runtime CPU package installed with version: 1.18.1
2024-07-20 10:34:07,881 - INFO - separator - Apple Silicon MPS/CoreML is available in Torch, setting Torch device to MPS
2024-07-20 10:34:07,881 - INFO - separator - ONNXruntime has CoreMLExecutionProvider available, enabling acceleration
2024-07-20 10:34:07,881 - INFO - separator - Loading model MDX23C-8KFFT-InstVoc_HQ.ckpt...
2024-07-20 10:34:07,881 - DEBUG - separator - Model path set to /assets/separation/MDX23C-8KFFT-InstVoc_HQ.ckpt
2024-07-20 10:34:07,881 - DEBUG - separator - Calculating MD5 hash for model file to identify model parameters from UVR data...
2024-07-20 10:34:07,881 - DEBUG - separator - Calculating hash of model file /assets/separation/MDX23C-8KFFT-InstVoc_HQ.ckpt
2024-07-20 10:34:07,903 - DEBUG - separator - Model /assets/separation/MDX23C-8KFFT-InstVoc_HQ.ckpt has hash 99b6ceaae542265a3b6d657bf9fde79f
2024-07-20 10:34:07,903 - DEBUG - separator - VR model data path set to /assets/separation/vr_model_data.json
2024-07-20 10:34:07,903 - DEBUG - separator - MDX model data path set to /assets/separation/mdx_model_data.json
2024-07-20 10:34:07,903 - DEBUG - separator - Loading MDX and VR model parameters from UVR model data files...
2024-07-20 10:34:07,905 - DEBUG - separator - Model data loaded: {'config_yaml': 'model_2_stem_full_band_8k.yaml'}
Traceback (most recent call last):
  File "/usr/local/Cellar/python@3.10/3.10.14/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/Cellar/python@3.10/3.10.14/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/sandbox/separator.py", line 73, in <module>
    separator.load_model(
  File "/.venv/lib/python3.10/site-packages/audio_separator/separator/separator.py", line 523, in load_model
    self.model_instance = MDXSeparator(common_config=common_params, arch_config=self.arch_specific_params["MDX"])
  File "/.venv/lib/python3.10/site-packages/audio_separator/separator/architectures/mdx_separator.py", line 22, in __init__
    super().__init__(config=common_config)
  File "/.venv/lib/python3.10/site-packages/audio_separator/separator/common_separator.py", line 83, in __init__
    self.primary_stem_name = self.model_data["primary_stem"]
KeyError: 'primary_stem'

UVR HP:

(.venv) ➜  PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 ./.venv/bin/python -m sandbox.separator
2024-07-20 10:28:31,793 - INFO - separator - Separator version 0.14.3 instantiating with output_dir: None, output_format: WAV
2024-07-20 10:28:31,793 - INFO - separator - Operating System: Darwin Darwin Kernel Version 23.5.0: Wed May  1 20:09:52 PDT 2024; root:xnu-10063.121.3~5/RELEASE_X86_64
2024-07-20 10:28:31,793 - INFO - separator - System: Darwin Node: xxxs-MacBook-Pro-2.local Release: 23.5.0 Machine: x86_64 Proc: i386
2024-07-20 10:28:31,793 - INFO - separator - Python Version: 3.10.14
2024-07-20 10:28:31,926 - INFO - separator - FFmpeg installed: ffmpeg version 7.0 Copyright (c) 2000-2024 the FFmpeg developers
2024-07-20 10:28:31,927 - DEBUG - separator - Python package: onnxruntime-gpu not installed
2024-07-20 10:28:31,927 - DEBUG - separator - Python package: onnxruntime-silicon not installed
2024-07-20 10:28:31,928 - INFO - separator - ONNX Runtime CPU package installed with version: 1.18.1
2024-07-20 10:28:31,949 - INFO - separator - Apple Silicon MPS/CoreML is available in Torch, setting Torch device to MPS
2024-07-20 10:28:31,949 - INFO - separator - ONNXruntime has CoreMLExecutionProvider available, enabling acceleration
2024-07-20 10:28:31,949 - INFO - separator - Loading model 2_HP-UVR.pth...
2024-07-20 10:28:31,949 - DEBUG - separator - Model path set to /assets/separation/2_HP-UVR.pth
2024-07-20 10:28:31,950 - DEBUG - separator - Calculating MD5 hash for model file to identify model parameters from UVR data...
2024-07-20 10:28:31,950 - DEBUG - separator - Calculating hash of model file /assets/separation/2_HP-UVR.pth
2024-07-20 10:28:31,971 - DEBUG - separator - Model /assets/separation/2_HP-UVR.pth has hash 941f3f7f0b0341f12087aacdfef644b1
2024-07-20 10:28:31,971 - DEBUG - separator - VR model data path set to /assets/separation/vr_model_data.json
2024-07-20 10:28:31,971 - DEBUG - separator - MDX model data path set to /assets/separation/mdx_model_data.json
2024-07-20 10:28:31,971 - DEBUG - separator - Loading MDX and VR model parameters from UVR model data files...
2024-07-20 10:28:31,972 - DEBUG - separator - Model data loaded: {'vr_model_param': '4band_v2', 'primary_stem': 'Instrumental'}
2024-07-20 10:28:31,972 - DEBUG - common_separator - Common params: model_name=2_HP-UVR, model_path=/assets/separation/2_HP-UVR.pth
2024-07-20 10:28:31,973 - DEBUG - common_separator - Common params: primary_stem_output_path=None, secondary_stem_output_path=None
2024-07-20 10:28:31,973 - DEBUG - common_separator - Common params: output_dir=None, output_format=WAV
2024-07-20 10:28:31,973 - DEBUG - common_separator - Common params: normalization_threshold=0.9
2024-07-20 10:28:31,973 - DEBUG - common_separator - Common params: enable_denoise=False, output_single_stem=None
2024-07-20 10:28:31,973 - DEBUG - common_separator - Common params: invert_using_spec=False, sample_rate=44100
2024-07-20 10:28:31,973 - DEBUG - common_separator - Common params: primary_stem_name=Instrumental, secondary_stem_name=Vocals
2024-07-20 10:28:31,973 - DEBUG - common_separator - Common params: is_karaoke=False, is_bv_model=False, bv_model_rebalance=0
2024-07-20 10:28:31,973 - DEBUG - vr_separator - Model data: {'vr_model_param': '4band_v2', 'primary_stem': 'Instrumental'}
2024-07-20 10:28:31,974 - DEBUG - vr_separator - Model params: {'bins': 672, 'unstable_bins': 8, 'reduction_bins': 637, 'band': {1: {'sr': 7350, 'hl': 80, 'n_fft': 640, 'crop_start': 0, 'crop_stop': 85, 'lpf_start': 25, 'lpf_stop': 53, 'res_type': 'polyphase'}, 2: {'sr': 7350, 'hl': 80, 'n_fft': 320, 'crop_start': 4, 'crop_stop': 87, 'hpf_start': 25, 'hpf_stop': 12, 'lpf_start': 31, 'lpf_stop': 62, 'res_type': 'polyphase'}, 3: {'sr': 14700, 'hl': 160, 'n_fft': 512, 'crop_start': 17, 'crop_stop': 216, 'hpf_start': 48, 'hpf_stop': 24, 'lpf_start': 139, 'lpf_stop': 210, 'res_type': 'polyphase'}, 4: {'sr': 44100, 'hl': 480, 'n_fft': 960, 'crop_start': 78, 'crop_stop': 383, 'hpf_start': 130, 'hpf_stop': 86, 'res_type': 'kaiser_fast'}}, 'sr': 44100, 'pre_filter_start': 668, 'pre_filter_stop': 672, 'mid_side': False, 'mid_side_b': False, 'mid_side_b2': False, 'stereo_w': False, 'stereo_n': False, 'reverse': False}
2024-07-20 10:28:31,974 - DEBUG - vr_separator - VR arch params: enable_tta=False, enable_post_process=False, post_process_threshold=0.2
2024-07-20 10:28:31,974 - DEBUG - vr_separator - VR arch params: batch_size=16, window_size=512
2024-07-20 10:28:31,974 - DEBUG - vr_separator - VR arch params: high_end_process=False, aggression=5
2024-07-20 10:28:31,974 - DEBUG - vr_separator - VR arch params: is_vr_51_model=False, model_samplerate=44100, model_capacity=(32, 128)
2024-07-20 10:28:31,974 - INFO - vr_separator - VR Separator initialisation complete
2024-07-20 10:28:31,974 - DEBUG - separator - Loading model completed.
2024-07-20 10:28:31,974 - INFO - separator - Load model duration: 00:00:00
2024-07-20 10:28:31,974 - INFO - separator - Starting separation process for audio_file_path: /sandbox/downloads/dl.mp3
2024-07-20 10:28:31,974 - DEBUG - separator - Normalization threshold set to 0.9, waveform will lowered to this max amplitude to avoid clipping.
2024-07-20 10:28:31,974 - DEBUG - separator - Denoising disabled, model will only be run once. This is twice as fast, but may result in noisier output audio.
2024-07-20 10:28:31,974 - DEBUG - vr_separator - Starting inference...
2024-07-20 10:28:31,974 - DEBUG - vr_separator - Model size determined: 123812, NN architecture size: 123812
2024-07-20 10:28:31,974 - DEBUG - vr_separator - Determining model capacity...
2024-07-20 10:28:32,532 - DEBUG - vr_separator - Model loaded and moved to device.
2024-07-20 10:28:32,532 - DEBUG - vr_separator - loading_mix iteraring through 4 bands
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:02<00:00,  1.49it/s]
2024-07-20 10:28:36,673 - DEBUG - vr_separator - inference_vr appending to X_dataset for each of 90 patches
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 90/90 [00:00<00:00, 494740.97it/s]
2024-07-20 10:28:36,673 - DEBUG - vr_separator - inference_vr iterating through 5 batches, batch_size = 16
  0%|                                                                                                                           | 0/6 [00:24<?, ?it/s]
Traceback (most recent call last):
  File "/usr/local/Cellar/python@3.10/3.10.14/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/Cellar/python@3.10/3.10.14/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/sandbox/separator.py", line 87, in <module>
    output_files = separator.separate(input_path)
  File "lib/python3.10/site-packages/audio_separator/separator/separator.py", line 559, in separate
    output_files = self.model_instance.separate(audio_file_path)
  File "lib/python3.10/site-packages/audio_separator/separator/architectures/vr_separator.py", line 150, in separate
    y_spec, v_spec = self.inference_vr(self.loading_mix(), self.torch_device, self.aggressiveness)
  File "lib/python3.10/site-packages/audio_separator/separator/architectures/vr_separator.py", line 305, in inference_vr
    mask = _execute(X_mag_pad, roi_size)
  File "lib/python3.10/site-packages/audio_separator/separator/architectures/vr_separator.py", line 272, in _execute
    pred = self.model_run.predict_mask(X_batch)
  File "lib/python3.10/site-packages/audio_separator/separator/uvr_lib_v5/vr_network/nets.py", line 169, in predict_mask
    mask = self.forward(input_tensor)
  File "lib/python3.10/site-packages/audio_separator/separator/uvr_lib_v5/vr_network/nets.py", line 148, in forward
    hidden_state = self.stg3_full_band_net(self.stg3_bridge(hidden_state))
  File "lib/python3.10/site-packages/audio_separator/separator/uvr_lib_v5/vr_network/nets.py", line 62, in __call__
    hidden_state = self.dec1(hidden_state, encoder_output1)
  File "lib/python3.10/site-packages/audio_separator/separator/uvr_lib_v5/vr_network/layers.py", line 183, in __call__
    input_tensor = torch.cat([input_tensor, skip], dim=1)  # Concatenate input_tensor and skip_connection along the channel dimension.
RuntimeError: Invalid buffer size: 3.94 GB
beveradb commented 3 months ago

How did you install it? You're using a version from a year ago with more bugs than the latest version 😅

Separator version 0.14.3 instantiating
ybwai commented 3 months ago

oh bugger, I added this to my requirements txt:

audio-separator[cpu]; sys_platform == 'darwin'
audio-separator[gpu]; sys_platform != 'darwin'

and did a pip install. let figure out why its getting an old version.. sorry :)

beveradb commented 3 months ago

On a mac, pip install "audio-separator[cpu]" should be all you need since you don't need to worry about CUDA!

See https://github.com/nomadkaraoke/python-audio-separator#-apple-silicon-macos-sonoma-with-m1-or-newer-cpu-coreml-acceleration :)

ybwai commented 3 months ago
audio-separator[cpu]>=0.17.5; sys_platform == 'darwin'
audio-separator[gpu]>=0.17.5; sys_platform != 'darwin'

seems to have fixed it. this allows me to use Mac locally and CUDA on modal

ybwai commented 3 months ago

0.17.5 definitely working a lot better. MDX23C now works and results are good thanks!

Sadly still having issues with the VR Architecture models:

PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 ./.venv/bin/python -m sandbox.separator
2024-07-20 18:23:13,812 - INFO - separator - Separator version 0.17.5 instantiating with output_dir: None, output_format: WAV
2024-07-20 18:23:13,812 - INFO - separator - Output directory not specified. Using current working directory.
2024-07-20 18:23:13,812 - INFO - separator - Operating System: Darwin Darwin Kernel Version 23.5.0: Wed May  1 20:09:52 PDT 2024; root:xnu-10063.121.3~5/RELEASE_X86_64
2024-07-20 18:23:13,822 - INFO - separator - System: Darwin Node: xxxs-MacBook-Pro-2.local Release: 23.5.0 Machine: x86_64 Proc: i386
2024-07-20 18:23:13,822 - INFO - separator - Python Version: 3.10.14
2024-07-20 18:23:13,822 - INFO - separator - PyTorch Version: 2.2.2
2024-07-20 18:23:13,939 - INFO - separator - FFmpeg installed: ffmpeg version 7.0 Copyright (c) 2000-2024 the FFmpeg developers
2024-07-20 18:23:13,941 - INFO - separator - ONNX Runtime CPU package installed with version: 1.18.1
2024-07-20 18:23:13,958 - INFO - separator - Apple Silicon MPS/CoreML is available in Torch, setting Torch device to MPS
2024-07-20 18:23:13,958 - INFO - separator - ONNXruntime has CoreMLExecutionProvider available, enabling acceleration
2024-07-20 18:23:13,958 - INFO - separator - Loading model 3_HP-Vocal-UVR.pth...
2024-07-20 18:23:14,824 - INFO - vr_separator - VR Separator initialisation complete
2024-07-20 18:23:14,825 - INFO - separator - Load model duration: 00:00:00
2024-07-20 18:23:14,825 - INFO - separator - Starting separation process for audio_file_path: /sandbox/downloads/dl.mp3
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:04<00:00,  1.07s/it]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 84/84 [00:00<00:00, 635959.45it/s]
  0%|                                                                                                                           | 0/6 [00:22<?, ?it/s]
Traceback (most recent call last):
  File "/usr/local/Cellar/python@3.10/3.10.14/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/Cellar/python@3.10/3.10.14/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/sandbox/separator.py", line 141, in <module>
    output_files = separator.separate(input_path)
  File "/.venv/lib/python3.10/site-packages/audio_separator/separator/separator.py", line 704, in separate
    output_files = self.model_instance.separate(audio_file_path)
  File "/.venv/lib/python3.10/site-packages/audio_separator/separator/architectures/vr_separator.py", line 150, in separate
    y_spec, v_spec = self.inference_vr(self.loading_mix(), self.torch_device, self.aggressiveness)
  File "/.venv/lib/python3.10/site-packages/audio_separator/separator/architectures/vr_separator.py", line 324, in inference_vr
    mask = _execute(X_mag_pad, roi_size)
  File "/.venv/lib/python3.10/site-packages/audio_separator/separator/architectures/vr_separator.py", line 291, in _execute
    pred = self.model_run.predict_mask(X_batch)
  File "/.venv/lib/python3.10/site-packages/audio_separator/separator/uvr_lib_v5/vr_network/nets.py", line 169, in predict_mask
    mask = self.forward(input_tensor)
  File "/.venv/lib/python3.10/site-packages/audio_separator/separator/uvr_lib_v5/vr_network/nets.py", line 148, in forward
    hidden_state = self.stg3_full_band_net(self.stg3_bridge(hidden_state))
  File "/.venv/lib/python3.10/site-packages/audio_separator/separator/uvr_lib_v5/vr_network/nets.py", line 62, in __call__
    hidden_state = self.dec1(hidden_state, encoder_output1)
  File "/.venv/lib/python3.10/site-packages/audio_separator/separator/uvr_lib_v5/vr_network/layers.py", line 179, in __call__
    input_tensor = F.interpolate(input_tensor, scale_factor=2, mode="bilinear", align_corners=True)
  File "/.venv/lib/python3.10/site-packages/torch/nn/functional.py", line 4038, in interpolate
    return torch._C._nn.upsample_bilinear2d(input, output_size, align_corners, scale_factors)
RuntimeError: Invalid buffer size: 3.00 GB
ybwai commented 3 months ago

Also with the roformer models I get this:

PYTORCH_ENABLE_MPS_FALLBACK=1 PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 ./.venv/bin/python -m sandbox.separator
2024-07-20 18:29:58,351 - INFO - separator - Separator version 0.17.5 instantiating with output_dir: None, output_format: WAV
2024-07-20 18:29:58,351 - INFO - separator - Output directory not specified. Using current working directory.
2024-07-20 18:29:58,351 - INFO - separator - Operating System: Darwin Darwin Kernel Version 23.5.0: Wed May  1 20:09:52 PDT 2024; root:xnu-10063.121.3~5/RELEASE_X86_64
2024-07-20 18:29:58,362 - INFO - separator - System: Darwin Node: xxxs-MacBook-Pro-2.local Release: 23.5.0 Machine: x86_64 Proc: i386
2024-07-20 18:29:58,362 - INFO - separator - Python Version: 3.10.14
2024-07-20 18:29:58,362 - INFO - separator - PyTorch Version: 2.2.2
2024-07-20 18:29:58,715 - INFO - separator - FFmpeg installed: ffmpeg version 7.0 Copyright (c) 2000-2024 the FFmpeg developers
2024-07-20 18:29:58,717 - INFO - separator - ONNX Runtime CPU package installed with version: 1.18.1
2024-07-20 18:29:58,738 - INFO - separator - Apple Silicon MPS/CoreML is available in Torch, setting Torch device to MPS
2024-07-20 18:29:58,738 - INFO - separator - ONNXruntime has CoreMLExecutionProvider available, enabling acceleration
2024-07-20 18:29:58,738 - INFO - separator - Loading model model_mel_band_roformer_ep_3005_sdr_11.4360.ckpt...
2024-07-20 18:30:04,131 - INFO - mdxc_separator - MDXC Separator initialisation complete
2024-07-20 18:30:04,133 - INFO - separator - Load model duration: 00:00:05
2024-07-20 18:30:04,134 - INFO - separator - Starting separation process for audio_file_path: sandbox/downloads/dl.mp3
  0%|                                                                                                                          | 0/32 [00:26<?, ?it/s]
Traceback (most recent call last):
  File "/usr/local/Cellar/python@3.10/3.10.14/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/Cellar/python@3.10/3.10.14/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "sandbox/separator.py", line 141, in <module>
    output_files = separator.separate(input_path)
  File ".venv/lib/python3.10/site-packages/audio_separator/separator/separator.py", line 704, in separate
    output_files = self.model_instance.separate(audio_file_path)
  File ".venv/lib/python3.10/site-packages/audio_separator/separator/architectures/mdxc_separator.py", line 134, in separate
    source = self.demix(mix=mix)
  File ".venv/lib/python3.10/site-packages/audio_separator/separator/architectures/mdxc_separator.py", line 248, in demix
    x = self.model_run(part.unsqueeze(0))[0]
  File ".venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File ".venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File ".venv/lib/python3.10/site-packages/audio_separator/separator/uvr_lib_v5/mel_band_roformer.py", line 462, in forward
    masks.cpu() if x_is_mps else masks).to(device)
RuntimeError: Unsupported type byte size: ComplexFloat

guessing Intel Macbook no good for this?

beveradb commented 3 months ago

Hmm, I don't have an intel mac to test on, but this is a bit confusing to me 🤔

Apple Silicon MPS/CoreML is available in Torch, setting Torch device to MPS

That doesn't make sense to me, as I thought it was only Apple Silicon macs which supported the Apple MPS GPU acceleration: https://developer.apple.com/documentation/metalperformanceshaders

Here's the code where that is checked for in audio-separator: https://github.com/nomadkaraoke/python-audio-separator/blob/main/audio_separator/separator/separator.py#L225

it then calls this function which is where the torch device is set to use MPS: https://github.com/nomadkaraoke/python-audio-separator/blob/main/audio_separator/separator/separator.py#L251

I'd appreciate if you could test changing that to stop it from configuring MPS (forcing it to use your CPU only for inferencing) - e.g. this change: https://github.com/nomadkaraoke/python-audio-separator/pull/91/files

If that works and makes all models inference without error for you, then we need to improve the way we detect support for Apple MPS in audio-separator to ensure it doesn't try to use it on Intel Macs!

ybwai commented 3 months ago

That seems to have done it yes.

Maybe the issue is PyTorch not classifying my MacBook properly as having a low power GPU - it has a Radeon PRO 5300M (4GB) which would explain the buffer size issue.

https://github.com/pytorch/pytorch/blob/8571007017b61d793c406142bad6baeda331d00d/aten/src/ATen/mps/MPSDevice.mm#L31

beveradb commented 3 months ago

Gotcha, thank you for confirming!

I've just released audio-separator version 0.17.6 with a fix for this - basically I'm just detecting the processor type and only enabling MPS if it's ARM.

I think PyTorch doesn't really support MPS properly on Intel Mac GPUs unfortunately, so this is probably the best option for now so things at least work out of the box for folks like you, even if that means ignoring your GPU unfortunately.

ybwai commented 3 months ago

Lovely will upgrade to it.

Also, some UVR models (5_HP, 6_HP & UVR-BVE) are just giving me empty sound files and Demucs models complain about a missing _tkinter module. Tried a brew install python-tk but no luck.

beveradb commented 3 months ago

I think those VR models are a known issue I'm afraid, I'd love for the BVE one in particular to work in audio-separator but it doesn't and I haven't prioritized trying to figure out why yet: https://github.com/nomadkaraoke/python-audio-separator/issues/45

Contributions very much welcome 🙏

Not sure about the demucs issue, does it actually fail? If so, please share debug logs as I've used the htdemucs_6s.yaml model a bunch without issues!

ybwai commented 3 months ago

Demucs doesn't seem to start:

2024-07-20 23:42:45,855 - INFO - separator - Separator version 0.17.5 instantiating with output_dir: None, output_format: WAV
2024-07-20 23:42:45,855 - INFO - separator - Output directory not specified. Using current working directory.
2024-07-20 23:42:45,855 - INFO - separator - Operating System: Darwin Darwin Kernel Version 23.5.0: Wed May  1 20:09:52 PDT 2024; root:xnu-10063.121.3~5/RELEASE_X86_64
2024-07-20 23:42:45,864 - INFO - separator - System: Darwin Node: xxx-MacBook-Pro-2.local Release: 23.5.0 Machine: x86_64 Proc: i386
2024-07-20 23:42:45,865 - INFO - separator - Python Version: 3.10.14
2024-07-20 23:42:45,865 - INFO - separator - PyTorch Version: 2.2.2
2024-07-20 23:42:51,950 - INFO - separator - FFmpeg installed: ffmpeg version 7.0.1 Copyright (c) 2000-2024 the FFmpeg developers
2024-07-20 23:42:51,951 - INFO - separator - ONNX Runtime CPU package installed with version: 1.18.1
2024-07-20 23:42:51,951 - INFO - separator - No hardware acceleration could be configured, running in CPU mode
2024-07-20 23:42:51,952 - INFO - separator - Loading model htdemucs_6s.yaml...
Traceback (most recent call last):
  File "/usr/local/Cellar/python@3.10/3.10.14_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/Cellar/python@3.10/3.10.14_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "sandbox/separator.py", line 126, in <module>
    separator.load_model(
  File ".venv/lib/python3.10/site-packages/audio_separator/separator/separator.py", line 673, in load_model
    module = importlib.import_module(f"audio_separator.separator.architectures.{module_name}")
  File "/usr/local/Cellar/python@3.10/3.10.14_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File ".venv/lib/python3.10/site-packages/audio_separator/separator/architectures/demucs_separator.py", line 7, in <module>
    from audio_separator.separator.uvr_lib_v5.demucs.apply import apply_model, demucs_segments
  File ".venv/lib/python3.10/site-packages/audio_separator/separator/uvr_lib_v5/demucs/apply.py", line 19, in <module>
    import tkinter as tk
  File "/usr/local/Cellar/python@3.10/3.10.14_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/tkinter/__init__.py", line 37, in <module>
    import _tkinter # If this fails your Python may not be configured for Tk
ModuleNotFoundError: No module named '_tkinter'
beveradb commented 3 months ago

Huh, that's very strange, for two reasons:

So anyway, I released another new version, audio-separator version 0.18.3 which removes those references and should fix demucs for you 🙏

ybwai commented 3 months ago

Demucs working on new version 0.18.3!