nomadkaraoke / python-audio-separator

Easy to use stem (e.g. instrumental/vocals) separation from CLI or as a python package, using a variety of amazing pre-trained models (primarily from UVR)
MIT License
514 stars 86 forks source link

VR HP models causing OOM with 12gb VRAM #109

Closed kikoferrer closed 2 months ago

kikoferrer commented 2 months ago

Hi,

Just wanting to know how much VRAM does the VR HP models consume? I only get OOM from them. Other models range from 2-4gb vram usage. but if I use any VR HP 1-9 model i get OOM.

I have used HP5 models from RVC webui and it never gave me OOM so I was wondering if there was a miss with my install? Or would it be better to use cpu ram for these? Thanks

beveradb commented 2 months ago

That's normal I'm afraid! In my experience the VR models are very heavy on memory usage (but they're typically faster to run / lower CPU/GPU usage).

You can try playing with the parameters e.g. lowering the --vr_batch_size parameter:

VR Architecture Parameters:
  --vr_batch_size VR_BATCH_SIZE                          number of batches to process at a time. higher = more RAM, slightly faster processing (default: 4). Example: --vr_batch_size=16
  --vr_window_size VR_WINDOW_SIZE                        balance quality and speed. 1024 = fast but lower, 320 = slower but better quality. (default: 512). Example: --vr_window_size=320
  --vr_aggression VR_AGGRESSION                          intensity of primary stem extraction, -100 - 100. typically 5 for vocals & instrumentals (default: 5). Example: --vr_aggression=2
  --vr_enable_tta                                        enable Test-Time-Augmentation; slow but improves quality (default: False). Example: --vr_enable_tta
  --vr_high_end_process                                  mirror the missing frequency range of the output (default: False). Example: --vr_high_end_process
  --vr_enable_post_process                               identify leftover artifacts within vocal output; may improve separation for some songs (default: False). Example: --vr_enable_post_process
  --vr_post_process_threshold VR_POST_PROCESS_THRESHOLD  threshold for post_process feature: 0.1-0.3 (default: 0.2). Example: --vr_post_process_threshold=0.1

Or just use another model - there are better models for most use cases :)

kikoferrer commented 2 months ago

Appreciate it mate. I am fond of using HP5 to seperate main vocal to other vocals and other models cant seem to do it. In your experience, what is the vram requirements for then vr models?

kikoferrer commented 2 months ago

Oh I think I get it know. I am using this as a python dependency and noticed batch size for VR is defaulted to 16 in as dependency. That must by why. The CLI default is batch size 4.

Maybe if you can change the default for dependency to 4 so it is same with the cli it would make it easier for others too. Anyway thank you for the good work for this project!

beveradb commented 2 months ago

PRs welcome! 😉