nomadkaraoke / python-audio-separator

Easy to use stem (e.g. instrumental/vocals) separation from CLI or as a python package, using a variety of amazing pre-trained models (primarily from UVR)
MIT License
514 stars 86 forks source link

Best Settings #123

Closed Bebra777228 closed 1 month ago

Bebra777228 commented 1 month ago

Please share what settings should be used to achieve the highest quality separation of a song?

Perhaps you have already had experience solving similar tasks and know which parameters will provide the best result? I would be very grateful for your help!

I am mainly interested in VR and MDXC.

mdx_params={
    "hop_length": 1024,
    "segment_size": 256,
    "overlap": 0.25,
    "batch_size": 1,
    "enable_denoise": False,
    },
vr_params={
    "batch_size": 1,
    "window_size": 512,
    "aggression": 5,
    "enable_tta": False,
    "enable_post_process": False,
    "post_process_threshold": 0.2,
    "high_end_process": False,
    },
demucs_params={
    "segment_size": "Default",
    "shifts": 2,
    "overlap": 0.25,
    "segments_enabled": True,
    },
mdxc_params={
    "segment_size": 256,
    "batch_size": 1,
    "overlap": 8,
    },
beveradb commented 1 month ago

There is no such thing as "best" for all tracks and use cases.

The default settings passed in by the CLI are already designed to provide the "best" compromise between performance and resource usage for most inputs, but of course anyone can choose to play around with the settings and possibly get better results for a specific input track.

The type of model and how it was trained makes much more of a significant impact than any of these parameters, in my opinion.

I am mainly interested in VR and MDXC.

Why? If you want the best separation, these days the RoFormer models e.g. model_bs_roformer_ep_317_sdr_12.9755.ckpt will give a much better result.

It's all subjective though!

Bebra777228 commented 1 month ago

In the demucs_params, the segment_size is set to 'Default'. I would like to understand what this means and what specific value 'Default' implies.

Additionally, it would be helpful to know at least the approximate minimum and maximum values for each parameter across different architectures.