nomadkaraoke / python-audio-separator

Easy to use stem (e.g. instrumental/vocals) separation from CLI or as a python package, using a variety of amazing pre-trained models (primarily from UVR)
MIT License
477 stars 82 forks source link

Roformer Models Not Running on GPU. Other models fine. Roformer models run on GPU on UVR5. #73

Closed jet082 closed 4 months ago

jet082 commented 5 months ago

I commented elsewhere, but here is an issue since I believe this is a clear bug.

Running model_bs_roformer_ep_317_sdr_12.9755 on UVR5 on a roughly 60 minute file called 03.wav takes about 10 minutes and I look over to my task manager and it is clearly using my GPU.

With python-audio-separator, however, my GPU is not being used (looking over to my task manager) and the same file takes 3-4 hours to process. Python-audio-separator does use my GPU and is quite fast while using models like UVR-MDX-NET-Inst_HQ_4.onnx or MDX23C-8KFFT-InstVoc_HQ_2.ckpt. It is only the roformer models that ignore my GPU.

Here is the code below. Debug output from the command line version will be below that.

from audio_separator.separator import Separator
separator = Separator()
separator.load_model(model_filename='model_bs_roformer_ep_317_sdr_12.9755.ckpt')
output_files = separator.separate('03.wav')
print(output_files)

The output is as follows:

2024-05-22 01:42:54,855 - INFO - separator - Separator version 0.17.1 instantiating with output_dir: None, output_format: WAV
2024-05-22 01:42:54,856 - INFO - separator - Operating System: Windows 10.0.22635
2024-05-22 01:42:54,880 - INFO - separator - System: Windows Node: Irrelevant Release: 11 Machine: AMD64 Proc: Intel64 Family 6 Model 151 Stepping 2, GenuineIntel
2024-05-22 01:42:54,880 - INFO - separator - Python Version: 3.12.3
2024-05-22 01:42:54,880 - INFO - separator - PyTorch Version: 2.3.0+cu121
2024-05-22 01:42:55,275 - INFO - separator - FFmpeg installed: ffmpeg version N-113449-g2887bd4a03-ge0da916b8f+4 Copyright (c) 2000-2024 the FFmpeg developers
2024-05-22 01:42:55,276 - INFO - separator - ONNX Runtime GPU package installed with version: 1.17.1
2024-05-22 01:42:55,306 - INFO - separator - CUDA is available in Torch, setting Torch device to CUDA
2024-05-22 01:42:55,306 - INFO - separator - ONNXruntime has CUDAExecutionProvider available, enabling acceleration
2024-05-22 01:42:55,306 - INFO - separator - Loading model model_bs_roformer_ep_317_sdr_12.9755.ckpt...
2024-05-22 01:43:04,818 - INFO - mdxc_separator - MDXC Separator initialisation complete
2024-05-22 01:43:04,819 - INFO - separator - Load model duration: 00:00:09
2024-05-22 01:43:04,819 - INFO - separator - Starting separation process for audio_file_path: 03.wav
  6%|█████                                                                          | 29/456 [13:31<3:17:42, 27.78s/it]

Command line output for audio-separator -m 'model_bs_roformer_ep_317_sdr_12.9755.ckpt' --log_level debug -d .\03.wav

C:\Users\jet082\AppData\Local\Programs\Python\Python312\Lib\site-packages\onnxruntime\capi\onnxruntime_validation.py:26: UserWarning: Unsupported Windows version (11). ONNX Runtime supports Windows 10 and above, only.
  warnings.warn(
2024-05-22 04:44:54.787 - INFO - cli - Separator version 0.17.1 beginning with input file: .\03.wav
2024-05-22 04:44:54.788 - INFO - separator - Separator version 0.17.1 instantiating with output_dir: None, output_format: FLAC
2024-05-22 04:44:54.788 - INFO - separator - Operating System: Windows 10.0.22635
2024-05-22 04:44:54.798 - INFO - separator - System: Windows Node: Irrelevant Release: 11 Machine: AMD64 Proc: Intel64 Family 6 Model 151 Stepping 2, GenuineIntel
2024-05-22 04:44:54.798 - INFO - separator - Python Version: 3.12.3
2024-05-22 04:44:54.798 - INFO - separator - PyTorch Version: 2.3.0+cu121
2024-05-22 04:44:54.897 - INFO - separator - FFmpeg installed: ffmpeg version N-113449-g2887bd4a03-ge0da916b8f+4 Copyright (c) 2000-2024 the FFmpeg developers
2024-05-22 04:44:54.898 - DEBUG - separator - Python package: onnxruntime-silicon not installed
2024-05-22 04:44:54.899 - DEBUG - separator - Python package: onnxruntime not installed
2024-05-22 04:44:54.899 - INFO - separator - ONNX Runtime GPU package installed with version: 1.17.1
2024-05-22 04:44:54.911 - INFO - separator - CUDA is available in Torch, setting Torch device to CUDA
2024-05-22 04:44:54.911 - INFO - separator - ONNXruntime has CUDAExecutionProvider available, enabling acceleration
2024-05-22 04:44:54.911 - INFO - separator - Loading model model_bs_roformer_ep_317_sdr_12.9755.ckpt...
2024-05-22 04:44:54.911 - DEBUG - separator - File already exists at /tmp/audio-separator-models/download_checks.json, skipping download
2024-05-22 04:44:54.912 - DEBUG - separator - Model download list loaded
2024-05-22 04:44:54.912 - DEBUG - separator - Searching for model_filename model_bs_roformer_ep_317_sdr_12.9755.ckpt in supported_model_files_grouped
2024-05-22 04:44:54.912 - DEBUG - separator - Found input filename model_bs_roformer_ep_317_sdr_12.9755.ckpt in multi-file model: Roformer Model: BS-Roformer-Viperx-1297
2024-05-22 04:44:54.912 - DEBUG - separator - Multi-file model identified: Roformer Model: BS-Roformer-Viperx-1297, iterating through files to download
2024-05-22 04:44:54.912 - DEBUG - separator - Attempting to identify download URL for config pair: model_bs_roformer_ep_317_sdr_12.9755.ckpt -> model_bs_roformer_ep_317_sdr_12.9755.yaml
2024-05-22 04:44:54.912 - DEBUG - separator - File already exists at /tmp/audio-separator-models/model_bs_roformer_ep_317_sdr_12.9755.ckpt, skipping download
2024-05-22 04:44:54.912 - DEBUG - separator - File already exists at /tmp/audio-separator-models/model_bs_roformer_ep_317_sdr_12.9755.yaml, skipping download
2024-05-22 04:44:54.912 - DEBUG - separator - All files downloaded for model Roformer Model: BS-Roformer-Viperx-1297, returning initial path /tmp/audio-separator-models/model_bs_roformer_ep_317_sdr_12.9755.ckpt
2024-05-22 04:44:54.912 - DEBUG - separator - Model downloaded, friendly name: Roformer Model: BS-Roformer-Viperx-1297, model_path: /tmp/audio-separator-models/model_bs_roformer_ep_317_sdr_12.9755.ckpt
2024-05-22 04:44:54.912 - DEBUG - separator - Loading model data from YAML at path /tmp/audio-separator-models/model_bs_roformer_ep_317_sdr_12.9755.yaml
2024-05-22 04:44:54.916 - DEBUG - separator - Model data loaded from YAML file: {'audio': {'chunk_size': 352800, 'dim_f': 1024, 'dim_t': 801, 'hop_length': 441, 'n_fft': 2048, 'num_channels': 2, 'sample_rate': 44100, 'min_mean_abs': 0.001}, 'model': {'dim': 512, 'depth': 12, 'stereo': True, 'num_stems': 1, 'time_transformer_depth': 1, 'freq_transformer_depth': 1, 'freqs_per_bands': (2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 12, 12, 12, 12, 12, 12, 12, 12, 24, 24, 24, 24, 24, 24, 24, 24, 48, 48, 48, 48, 48, 48, 48, 48, 128, 129), 'dim_head': 64, 'heads': 8, 'attn_dropout': 0.1, 'ff_dropout': 0.1, 'flash_attn': True, 'dim_freqs_in': 1025, 'stft_n_fft': 2048, 'stft_hop_length': 441, 'stft_win_length': 2048, 'stft_normalized': False, 'mask_estimator_depth': 2, 'multi_stft_resolution_loss_weight': 1.0, 'multi_stft_resolutions_window_sizes': (4096, 2048, 1024, 512, 256), 'multi_stft_hop_size': 147, 'multi_stft_normalized': False}, 'training': {'batch_size': 16, 'gradient_accumulation_steps': 1, 'grad_clip': 0, 'instruments': ['Vocals', 'Instrumental'], 'lr': 5e-05, 'patience': 2, 'reduce_factor': 0.95, 'target_instrument': 'Vocals', 'num_epochs': 1000, 'num_steps': 1000, 'augmentation': False, 'augmentation_type': 'simple1', 'use_mp3_compress': False, 'augmentation_mix': True, 'augmentation_loudness': True, 'augmentation_loudness_type': 1, 'augmentation_loudness_min': 0.5, 'augmentation_loudness_max': 1.5, 'q': 0.95, 'coarse_loss_clip': True, 'ema_momentum': 0.999, 'optimizer': 'adam', 'other_fix': False, 'use_amp': True}, 'inference': {'batch_size': 1, 'dim_t': 801, 'num_overlap': 4}}
2024-05-22 04:44:54.916 - DEBUG - separator - Importing module for model type MDXC: mdxc_separator.MDXCSeparator
2024-05-22 04:44:56.905 - DEBUG - separator - Instantiating separator class for model type MDXC: <class 'audio_separator.separator.architectures.mdxc_separator.MDXCSeparator'>
2024-05-22 04:44:56.905 - DEBUG - common_separator - Common params: model_name=model_bs_roformer_ep_317_sdr_12, model_path=/tmp/audio-separator-models/model_bs_roformer_ep_317_sdr_12.9755.ckpt
2024-05-22 04:44:56.905 - DEBUG - common_separator - Common params: output_dir=None, output_format=FLAC
2024-05-22 04:44:56.905 - DEBUG - common_separator - Common params: normalization_threshold=0.9
2024-05-22 04:44:56.905 - DEBUG - common_separator - Common params: enable_denoise=None, output_single_stem=None
2024-05-22 04:44:56.905 - DEBUG - common_separator - Common params: invert_using_spec=False, sample_rate=44100
2024-05-22 04:44:56.905 - DEBUG - common_separator - Common params: primary_stem_name=Vocals, secondary_stem_name=Instrumental
2024-05-22 04:44:56.905 - DEBUG - common_separator - Common params: is_karaoke=False, is_bv_model=False, bv_model_rebalance=0
2024-05-22 04:44:56.905 - DEBUG - mdxc_separator - Model data: {'audio': {'chunk_size': 352800, 'dim_f': 1024, 'dim_t': 801, 'hop_length': 441, 'n_fft': 2048, 'num_channels': 2, 'sample_rate': 44100, 'min_mean_abs': 0.001}, 'model': {'dim': 512, 'depth': 12, 'stereo': True, 'num_stems': 1, 'time_transformer_depth': 1, 'freq_transformer_depth': 1, 'freqs_per_bands': (2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 12, 12, 12, 12, 12, 12, 12, 12, 24, 24, 24, 24, 24, 24, 24, 24, 48, 48, 48, 48, 48, 48, 48, 48, 128, 129), 'dim_head': 64, 'heads': 8, 'attn_dropout': 0.1, 'ff_dropout': 0.1, 'flash_attn': True, 'dim_freqs_in': 1025, 'stft_n_fft': 2048, 'stft_hop_length': 441, 'stft_win_length': 2048, 'stft_normalized': False, 'mask_estimator_depth': 2, 'multi_stft_resolution_loss_weight': 1.0, 'multi_stft_resolutions_window_sizes': (4096, 2048, 1024, 512, 256), 'multi_stft_hop_size': 147, 'multi_stft_normalized': False}, 'training': {'batch_size': 16, 'gradient_accumulation_steps': 1, 'grad_clip': 0, 'instruments': ['Vocals', 'Instrumental'], 'lr': 5e-05, 'patience': 2, 'reduce_factor': 0.95, 'target_instrument': 'Vocals', 'num_epochs': 1000, 'num_steps': 1000, 'augmentation': False, 'augmentation_type': 'simple1', 'use_mp3_compress': False, 'augmentation_mix': True, 'augmentation_loudness': True, 'augmentation_loudness_type': 1, 'augmentation_loudness_min': 0.5, 'augmentation_loudness_max': 1.5, 'q': 0.95, 'coarse_loss_clip': True, 'ema_momentum': 0.999, 'optimizer': 'adam', 'other_fix': False, 'use_amp': True}, 'inference': {'batch_size': 1, 'dim_t': 801, 'num_overlap': 4}, 'is_roformer': True}
2024-05-22 04:44:56.905 - DEBUG - mdxc_separator - MDXC arch params: batch_size=1, segment_size=256, overlap=8
2024-05-22 04:44:56.905 - DEBUG - mdxc_separator - MDXC arch params: override_model_segment_size=False, pitch_shift=0
2024-05-22 04:44:56.905 - DEBUG - mdxc_separator - Loading checkpoint model for inference...
2024-05-22 04:44:56.906 - DEBUG - mdxc_separator - Loading Roformer model...
2024-05-22 04:44:56.906 - DEBUG - mdxc_separator - Loading BSRoformer model...
C:\Users\jet082\AppData\Local\Programs\Python\Python312\Lib\site-packages\torch\functional.py:665: UserWarning: A window was not provided. A rectangular window will be applied,which is known to cause spectral leakage. Other windows such as torch.hann_window or torch.hamming_window can are recommended to reduce spectral leakage.To suppress this warning and use a rectangular window, explicitly set `window=torch.ones(n_fft, device=<device>)`. (Triggered internally at ..\aten\src\ATen\native\SpectralOps.cpp:842.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
2024-05-22 04:44:57.871 - DEBUG - mdxc_separator - is_vocal_main_target: True
2024-05-22 04:44:57.871 - INFO - mdxc_separator - MDXC Separator initialisation complete
2024-05-22 04:44:57.871 - DEBUG - separator - Loading model completed.
2024-05-22 04:44:57.871 - INFO - separator - Load model duration: 00:00:02
2024-05-22 04:44:57.872 - INFO - separator - Starting separation process for audio_file_path: .\03.wav
2024-05-22 04:44:57.872 - DEBUG - separator - Normalization threshold set to 0.9, waveform will lowered to this max amplitude to avoid clipping.
2024-05-22 04:44:57.872 - DEBUG - mdxc_separator - Preparing mix for input audio file .\03.wav...
2024-05-22 04:44:57.872 - DEBUG - common_separator - Loading audio from file: .\03.wav
2024-05-22 04:45:01.973 - DEBUG - common_separator - Audio loaded. Sample rate: 44100, Audio shape: (2, 160655361)
2024-05-22 04:45:02.141 - DEBUG - common_separator - Audio file is valid and contains data.
2024-05-22 04:45:02.141 - DEBUG - common_separator - Mix preparation completed.
2024-05-22 04:45:02.141 - DEBUG - mdxc_separator - Normalizing mix before demixing...
2024-05-22 04:45:02.597 - DEBUG - mdxc_separator - Using model default segment size: 801
2024-05-22 04:45:02.597 - DEBUG - mdxc_separator - Number of stems: 1
2024-05-22 04:45:02.597 - DEBUG - mdxc_separator - Chunk size: 352800
2024-05-22 04:45:02.597 - DEBUG - mdxc_separator - Step: 352800
  0%|                                                                                          | 0/456 [00:00<?, ?it/s]C:\Users\jet082\AppData\Local\Programs\Python\Python312\Lib\site-packages\torch\backends\cuda\__init__.py:342: FutureWarning: torch.backends.cuda.sdp_kernel() is deprecated. In the future, this context manager will be removed. Please see, torch.nn.attention.sdpa_kernel() for the new context manager, with updated signature.
  warnings.warn(
  0%|▏                                                                               | 1/456 [00:27<3:31:19, 27.87s/it]
jet082 commented 5 months ago

ChatGPT 4o is truly remarkable. On a lark, I pointed it at this repo, explained the problem, and asked it to help me out and it provided patches that fixed the problem. Now it's using the GPU and going quite quickly, even for (bs) roformers. I have no idea how much you can use from these files, but they do fix the issue for me. Perhaps they can help point you in the right direction.

https://github.com/jet082/python-audio-separator/blob/main/audio_separator/separator/architectures/mdxc_separator.py

https://github.com/jet082/python-audio-separator/blob/main/audio_separator/separator/uvr_lib_v5/bs_roformer.py

beveradb commented 5 months ago

Nice one! Could you raise a PR with the fix please?

jet082 commented 5 months ago

Thing is, these are automated patches from chatgpt... They seem minor, but I don't know if they'll break your existing code? I can make a PR of course, but it might be better to simply diff the files and patch in strictly what is necessary.

beveradb commented 5 months ago

For sure, that's why I want to see a PR 😄 (easiest way for me / others to see and review the diffs)

FWIW most of the code I've added to this codebase was heavily AI-assisted, initially using Github Copilot, then over the last 6 months or so using Cursor (https://cursor.sh) as my IDE instead of VSCode, which now uses gpt-4o for each suggestion / request!

jet082 commented 5 months ago

Okay I made a PR, but please do not merge it blindly as I do not trust it to follow your coding conventions or anything.

Also, there have been no changes to the other roformers python code, so you will need to adopt the changes there too.

https://github.com/karaokenerds/python-audio-separator/pull/74

beveradb commented 5 months ago

Hey @jet082 please could you try the latest release, version 0.17.2?

I believe I've fixed it for all roformer models with this commit 😄 https://github.com/karaokenerds/python-audio-separator/commit/a581da750a61e5ab25f70cc026bd296df61944c8

jet082 commented 5 months ago

I can confirm that this fixes both the bs and mel roformer models!

jet082 commented 4 months ago

This needs to be re-opened. I tried running the latest version on a rather large file. My GPU usage went up to 100%, but it listed it as taking over 2 days to complete.

I then reverted to my code in https://github.com/karaokenerds/python-audio-separator/pull/74. Like the current master, my GPU usage went up to 100%, but this time the estimate was 12 minutes instead of 2 days.

My guess is that the current code does an unnecessary transfer to the CPU and then back to the GPU or something like that. It might be best to simply adopt my PR or figure out how it works.

beveradb commented 4 months ago

Sorry to hear that :/ I'll try and look into it a bit more when I have some free time; you're probably right re. the cause but I'm not sure at the moment.

Just as a heads up though, the reason I didn't want to merge PR #74 as it was is that while it may support CPU and CUDA, it doesn't support MPS (which is my daily driver as my laptop is a macbook) - specifically this line:

device = 'cuda' if torch.cuda.is_available() else 'cpu'

It also disregards the previous (and battle-tested) logic in the main separator controller which detects the available inference device and configures them accordingly: https://github.com/karaokenerds/python-audio-separator/blob/main/audio_separator/separator/separator.py#L198

So if you decide you want to have another stab at making it work for you and also others more generally, ideally what we ought to do is pass through the configured device(s) to the BSRoformer class when it's instantiated in the load_model method in the MDXC class.

young01ai commented 4 months ago

FYI. it seems work well for me: https://github.com/young01ai/python-audio-separator/tree/fix-cuda

beveradb commented 4 months ago

Nice one @young01ai 😄 open a PR for that and I'll test and merge!

young01ai commented 4 months ago

Nice one @young01ai 😄 open a PR for that and I'll test and merge!

PR: #84 , hope it's helpful. @beveradb

beveradb commented 4 months ago

Nice one @young01ai 😄 open a PR for that and I'll test and merge!

PR: #84 , hope it's helpful. @beveradb

Thank you! I left a comment here, it's not working on my machine 🤔

beveradb commented 4 months ago

This should now be fixed in audio-separator version 0.17.5 onwards - huge thanks to @young01ai for PR #84 🙇

jet082 commented 3 months ago

This is still not fixed.

jet082 commented 3 months ago

Try this to see if it works on Mac - https://github.com/karaokenerds/python-audio-separator/pull/74

empz commented 2 months ago

This is still not fixed.

It's working for me with the latest Roformer models using a modified version of the Dockerfile from this repo.