nomadkaraoke / python-audio-separator

Easy to use stem (e.g. instrumental/vocals) separation from CLI or as a python package, using a variety of amazing pre-trained models (primarily from UVR)
MIT License
412 stars 69 forks source link

How to increase speed? #32

Closed yassinebelatar closed 7 months ago

yassinebelatar commented 8 months ago

I have a very large gpu 80GB i want to increase speed increasing batch doesn't help at all . Thanks

beveradb commented 8 months ago

What have you tried so far? Could you show me your logs from audio-separator running against one of your test files so I can see some details about your system, e.g. whether it's using CUDA already or not?

yassinebelatar commented 8 months ago

2024-01-12 17:29:06.296 - INFO - cli - Separator version 0.13.0 beginning with input file: audio.wav 2024-01-12T17:29:08.531031317Z: 2024-01-12 17:29:08.530 - INFO - separator - Separator version 0.13.0 instantiating with output_dir: processed/hdemucs_mmi/audio, output_format: wav 2024-01-12T17:29:08.531146939Z: 2024-01-12 17:29:08.530 - DEBUG - separator - Normalization threshold set to 0.9, waveform will lowered to this max amplitude to avoid clipping. 2024-01-12T17:29:08.531162699Z: 2024-01-12 17:29:08.530 - DEBUG - separator - Denoising enabled, model will be run twice to reduce noise in output audio. 2024-01-12 17:29:08.530 - DEBUG - separator - Separation settings set: sample_rate=44100, hop_length=1024, segment_size=256, overlap=0.25, batch_size=200 2024-01-12T17:29:08.531172149Z: 2024-01-12 17:29:08.530 - INFO - separator - Checking hardware specifics to configure acceleration 2024-01-12T17:29:08.531217460Z: 2024-01-12 17:29:08.531 - INFO - separator - Operating System: Linux #1 SMP Wed Sep 6 21:10:58 UTC 2023 2024-01-12T17:29:08.531236110Z: 2024-01-12 17:29:08.531 - INFO - separator - System: Linux Node: runc Release: 5.4.254-170.358.amzn2.x86_64 Machine: x86_64 Proc: x86_64 2024-01-12 17:29:08.531 - INFO - separator - Python Version: 3.10.12 2024-01-12T17:29:08.531353242Z: 2024-01-12 17:29:08.531 - INFO - separator - ONNX Runtime GPU package installed with version: 1.17.0 2024-01-12T17:29:08.651032904Z: 2024-01-12 17:29:08.650 - DEBUG - separator - Python package: onnxruntime-silicon not installed 2024-01-12T17:29:08.770596723Z: 2024-01-12 17:29:08.770 - DEBUG - separator - Python package: onnxruntime not installed 2024-01-12T17:29:08.770704015Z: 2024-01-12 17:29:08.770 - INFO - separator - Torch package installed with version: 2.1.2 2024-01-12T17:29:08.770745585Z: 2024-01-12 17:29:08.770 - INFO - separator - Torchvision package installed with version: 0.16.2 2024-01-12T17:29:08.888439825Z: 2024-01-12 17:29:08.888 - DEBUG - separator - Python package: torchaudio not installed 2024-01-12T17:29:08.920224045Z: 2024-01-12 17:29:08.919 - INFO - separator - CUDA is available in Torch, setting Torch device to CUDA 2024-01-12T17:29:08.920288766Z: 2024-01-12 17:29:08.920 - INFO - separator - ONNXruntime has CUDAExecutionProvider available, enabling acceleration 2024-01-12T17:29:08.920300626Z: 2024-01-12 17:29:08.920 - DEBUG - separator - Apple Silicon MPS/CoreML not available in Torch installation. If you expect this to work, please see README 2024-01-12 17:29:08.920 - INFO - separator - Loading model UVR-MDX-NET-Voc_FT... 2024-01-12T17:29:08.920377557Z: 2024-01-12 17:29:08.920 - DEBUG - separator - Model path set to ./mdx/UVR-MDX-NET-Voc_FT.onnx 2024-01-12T17:29:08.923009198Z: 2024-01-12 17:29:08.922 - DEBUG - separator - Reading model settings... 2024-01-12T17:29:09.037803563Z: 2024-01-12 17:29:09.037 - DEBUG - separator - Model ./mdx/UVR-MDX-NET-Voc_FT.onnx has hash 77d07b2667ddf05b9e3175941b4454a0 2024-01-12T17:29:09.037911945Z: 2024-01-12 17:29:09.037 - DEBUG - separator - Model data path set to ./mdx/model_data.json 2024-01-12T17:29:09.039536410Z: 2024-01-12 17:29:09.039 - DEBUG - separator - Loading model data... 2024-01-12T17:29:09.043078786Z: 2024-01-12 17:29:09.042 - DEBUG - separator - Model data loaded: {'compensate': 1.021, 'mdx_dim_f_set': 3072, 'mdx_dim_t_set': 8, 'mdx_n_fft_scale_set': 7680, 'primary_stem': 'Vocals'} 2024-01-12T17:29:09.043117737Z: 2024-01-12 17:29:09.042 - DEBUG - separator - Model params: primary_stem=Vocals, secondary_stem=Instrumental 2024-01-12T17:29:09.043131557Z: 2024-01-12 17:29:09.042 - DEBUG - separator - Model params: batch_size=200, compensate=1.021, segment_size=256, dim_f=3072, dim_t=256 2024-01-12T17:29:09.043140217Z: 2024-01-12 17:29:09.043 - DEBUG - separator - Model params: n_fft=7680, hop=1024 2024-01-12T17:29:09.043241658Z: 2024-01-12 17:29:09.043 - DEBUG - separator - Loading ONNX model for inference... 2024-01-12T17:29:09.415941466Z: 2024-01-12 17:29:09.415 - DEBUG - separator - Model loaded successfully using ONNXruntime inferencing session. 2024-01-12 17:29:09.415 - DEBUG - separator - Loading model completed. 2024-01-12T17:29:09.416047958Z: 2024-01-12 17:29:09.415 - INFO - separator - Load model duration: 00:00:00 2024-01-12T17:29:09.416060088Z: 2024-01-12 17:29:09.415 - INFO - separator - Starting separation process for audio_file_path: audio.wav 2024-01-12 17:29:09.415 - DEBUG - separator - Preparing mix... 2024-01-12 17:29:09.415 - DEBUG - separator - Loading audio from file: audio.wav 2024-01-12T17:29:15.120852864Z: 2024-01-12 17:29:15.120 - DEBUG - separator - Audio loaded. Sample rate: 44100, Audio shape: (2, 21051392) 2024-01-12T17:29:15.133382271Z: 2024-01-12 17:29:15.133 - DEBUG - separator - Audio file is valid and contains data. 2024-01-12 17:29:15.133 - DEBUG - separator - Mix preparation completed. 2024-01-12T17:29:15.133393471Z: 2024-01-12 17:29:15.133 - DEBUG - separator - Normalizing mix before demixing... 2024-01-12T17:29:15.163069597Z: 2024-01-12 17:29:15.162 - DEBUG - spec_utils - Maximum peak amplitude above clipping threshold, normalizing from 1.0 to max peak 0.9. 2024-01-12T17:29:15.173140206Z: 2024-01-12 17:29:15.172 - DEBUG - separator - Starting demixing process with is_match_mix: False... 2024-01-12T17:29:15.173150636Z: 2024-01-12 17:29:15.172 - DEBUG - separator - Initializing model settings... 2024-01-12T17:29:15.181574398Z: 2024-01-12 17:29:15.181 - DEBUG - separator - Model input params: n_fft=7680 hop_length=1024 dim_f=3072 2024-01-12T17:29:15.181708220Z: 2024-01-12 17:29:15.181 - DEBUG - separator - Model settings: n_bins=3841, trim=3840, chunk_size=261120, gen_size=253440 2024-01-12T17:29:15.181720050Z: 2024-01-12 17:29:15.181 - DEBUG - separator - Original mix stored. Shape: (2, 21051392) 2024-01-12 17:29:15.181 - DEBUG - separator - Standard chunk size: 261120, Overlap: 0.25 2024-01-12T17:29:15.181728301Z: 2024-01-12 17:29:15.181 - DEBUG - separator - Generated size calculated: 253440 2024-01-12T17:29:15.219492434Z: 2024-01-12 17:29:15.219 - DEBUG - separator - Mixture prepared with padding. Mixture shape: (2, 21296640) 2024-01-12 17:29:15.219 - DEBUG - separator - Step size for processing chunks: 195840 as overlap is set to 0.25. 2024-01-12T17:29:15.219608516Z: 2024-01-12 17:29:15.219 - DEBUG - separator - Total chunks to process: 109 2024-01-12T17:29:15.219617826Z: 2024-01-12 17:29:15.219 - DEBUG - separator - Processing chunk 1/109: Start 0, End 261120 2024-01-12T17:29:15.223947964Z: 2024-01-12 17:29:15.223 - DEBUG - separator - Window applied to the chunk. 2024-01-12T17:29:15.294255549Z: /usr/local/lib/python3.10/dist-packages/audio_separator/separator/separator.py:630: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:261.) mix_part = torch.tensor([mixpart], dtype=torch.float32).to(self.device) 2024-01-12T17:29:15.303884121Z: 2024-01-12 17:29:15.303 - DEBUG - separator - Mix part split into batches. Number of batches: 1 2024-01-12T17:29:15.303990103Z: 2024-01-12 17:29:15.303 - DEBUG - separator - Processing mix_wave batch 1/1 2024-01-12T17:29:15.415926152Z: /usr/local/lib/python3.10/dist-packages/torch/functional.py:650: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error. Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:863.) return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]

yassinebelatar commented 8 months ago

Also when increasing segment_size WARNING - separator - Model converted from onnx to pytorch due to segment size not matching dim_t, processing may be slower.

beveradb commented 8 months ago

Gotcha; that all looks right to me - you're using ONNX Runtime 1.17.0 so I assume you compiled it from source to get CUDA 12 support?

How long is it actually taking? In my experience, on a machine with CUDA GPU, a 4 minute track takes about 20-30 seconds to process.

If you check the full logs without debug loglevel there should be a couple of messages e.g. Load model duration and Separation duration

yassinebelatar commented 8 months ago

It takes duration: 60.22055721282959s for 8 mins track . I have a quite large gpu the model is only using 3 - 4gb of my gpu I want to increase speed I have no issues with high gpu ussage

beveradb commented 8 months ago

Also when increasing segment_size WARNING - separator - Model converted from onnx to pytorch due to segment size not matching dim_t, processing may be slower.

Yeah that's expected if you change segment size, you can see where that comes from here:

https://github.com/karaokenerds/python-audio-separator/blob/main/audio_separator/separator/separator.py#L317

Why are you changing the segment size? That will certainly make things slower in my experience.

Out of curiosity, have you benchmarked / compared against running the same separation using UVR GUI? Most of my separation code is either identical to that project or very closely aligned, there should be minimal differences.

beveradb commented 8 months ago

It takes duration: 60.22055721282959s for 8 mins track . I have a quite large gpu the model is only using 3 - 4gb of my gpu I want to increase speed I have no issues with high gpu ussage

Gotcha, that time is pretty normal.

If you want to make it faster, you'll need to dig into the code and work out some way to optimize it!

The work which @nnyj did in this fork may help! https://github.com/nnyj/python-audio-separator-live#benchmark-results

See https://github.com/karaokenerds/python-audio-separator/issues/3

PRs very welcome if you're able to improve performance :)

yassinebelatar commented 8 months ago

I was thinking about using ffmpeg to segment the file 1mins segment each (variable) then process and combine final stage ? What do you think of this approach ?

beveradb commented 8 months ago

It's certainly worth trying! Is the theory that you could process those segments in parallel?

The codebase already has PyDub as a dependency: https://github.com/karaokenerds/python-audio-separator/blob/main/audio_separator/separator/separator.py#L16

Which is a wrapper around ffmpeg and has an easy API for slicing audio into chunks: https://github.com/jiaaro/pydub#quickstart

So you've got a bit of a headstart; though I'm not sure off the top of my head what the right approach to parallelizing would be.

Good luck, and feel free to email me if you want to schedule a pair programming call or any other knowledge transfer :)

yassinebelatar commented 8 months ago

Hi Bveradb,

Tried to do parallel ! Segmented the file to 5 sections but your app dooesnt seem to handle multi processing especially with one loaaded model ! It works with batching but not parallel.

beveradb commented 8 months ago

You'd need to modify the code... that's why this is open source, you can just fork it and work on it

yassinebelatar commented 8 months ago

To be honnest am a newbie if you would just let me know which file to dig in to fix this and i'll try to ! Thank for your responsivness Really appreciate it !

beveradb commented 8 months ago

All of the separation logic is in the main Separator class:

https://github.com/karaokenerds/python-audio-separator/blob/main/audio_separator/separator/separator.py

Good luck! If you haven't written much Python code before you'll probably need to do quite a bit of learning in order to get to the point where you can contribute, but there's a lot of tutorials online :)

If you want to organize a pair programming call at some point, feel free to email me with a suitable date/time and I'm happy to try and help!

yassinebelatar commented 8 months ago

I was able to get it to work ! Using threading local ! Updating def separate(self, audio_file_path) ! This way variable are stored locally for each thread to avoid racing issue while running in parallel !

yassinebelatar commented 8 months ago

Check it out here : https://github.com/yassinebelatar/python-audio-separator/blob/main/audio_separator/separator/separator.py

yassinebelatar commented 8 months ago

This cut processing time / 3 for now I'll try to optimize and make it faster ! Btw for denoising either using true or false can't seem to have any impact is that only for me or for everyone ? Or I missed out something in call here is my code separator = Separator( denoise_enabled=True, model_file_dir="./mdx", output_single_stem="vocals", output_dir="vocals", log_level=logging.INFO, )

beveradb commented 8 months ago

Nice work! Good to know it works as a proof of concept!

If you could add the actual segmenting / thread management functionality to audio-separator and allow users to enable that via an initialization parameter using the class or CLI (e.g. --threads=4 or something), I'd welcome that PR 😄

yassinebelatar commented 8 months ago

Once everything optimized I'll ping you for an update ! Just wanted to thank you for quick response and help :) Good work

beveradb commented 7 months ago

FYI @yassinebelatar you may want to check out the latest version of audio-separator (0.14.4 or above)

There's now support for newer models and VR arch models, some of which are much faster on my machine (e.g. 2_HP-UVR.pth) and there are more parameters exposed to you which can let you control the speed of inferencing:

MDX Architecture Parameters:
  --mdx_segment_size MDX_SEGMENT_SIZE                    larger consumes more resources, but may give better results (default: 256). Example: --mdx_segment_size=256
  --mdx_overlap MDX_OVERLAP                              amount of overlap between prediction windows, 0.001-0.999. higher is better but slower (default: 0.25). Example: --mdx_overlap=0.25
  --mdx_batch_size MDX_BATCH_SIZE                        larger consumes more RAM but may process slightly faster (default: 1). Example: --mdx_batch_size=4
  --mdx_hop_length MDX_HOP_LENGTH                        usually called stride in neural networks, only change if you know what you're doing (default: 1024). Example: --mdx_hop_length=1024

VR Architecture Parameters:
  --vr_batch_size VR_BATCH_SIZE                          number of batches to process at a time. higher = more RAM, slightly faster processing (default: 4). Example: --vr_batch_size=16
  --vr_window_size VR_WINDOW_SIZE                        balance quality and speed. 1024 = fast but lower, 320 = slower but better quality. (default: 512). Example: --vr_window_size=320
  --vr_aggression VR_AGGRESSION                          intensity of primary stem extraction, -100 - 100. typically 5 for vocals & instrumentals (default: 5). Example: --vr_aggression=2
  --vr_enable_tta                                        enable Test-Time-Augmentation; slow but improves quality (default: False). Example: --vr_enable_tta
  --vr_high_end_process                                  mirror the missing frequency range of the output (default: False). Example: --vr_high_end_process
  --vr_enable_post_process                               identify leftover artifacts within vocal output; may improve separation for some songs (default: False). Example: --vr_enable_post_process
  --vr_post_process_threshold VR_POST_PROCESS_THRESHOLD  threshold for post_process feature: 0.1-0.3 (default: 0.2). Example: --vr_post_process_threshold=0.1
beveradb commented 7 months ago

FYI @yassinebelatar there's some sample code for splitting input audio into shorter segments, separating each, and then rejoining the separated parts afterwards in this comment: https://github.com/karaokenerds/python-audio-separator/issues/44#issuecomment-1962483718

You could potentially adapt something like that to launch multiple audio-separator processes in separate threads (or even perhaps in separate docker containers)

Also, I'd encourage you to try some of the VR arch (.pth) models, as I find they provide equally good results to some of the MDX models in about half of the compute time. For example, 2_HP-UVR.pth is my go-to for simple vocal/instrumental split.

I'm going to close this issue now as I think there are several options you can explore to make more efficient use of your resources, but feel free to reply in here if you want to share your progress or get any more support with this!