nomadkaraoke / python-audio-separator

Easy to use vocal separation from CLI or as a python package, using a variety of amazing pre-trained models (primarily from UVR)
MIT License
403 stars 65 forks source link

MP3 output defaults to 128kbps #103

Closed empz closed 14 hours ago

empz commented 3 weeks ago

When using output_format="MP3" while instantiating a Separator class, the output resuls in files with a bitrate of 128kbps. I belive this should default to 320kbps, but if you don't agree, an extra option to define this would be ideal.

beveradb commented 3 weeks ago

I agree! Please raise a PR 😊

empz commented 3 weeks ago

I agree! Please raise a PR 😊

I've looked into the code and couldn't find any bitrate settings anywhere. Could you explain how the mp3 encoding works? I assume the UVR-based separation models do not output in mp3 natively, is that right? If so, I assumed there was some ffmpeg-based post processing to encode the outputs into mp3, but again couldn't find anything :/

beveradb commented 3 weeks ago

Yeah there isn't currently anything specifically implemented for this, I don't use lossy output from audio-separator personally (I use FLAC for both input and output format so there's no lossy compression as part of the separation process) so I haven't really spent any time on it.

You're right, it is essentially ffmpeg doing the output encoding, but it uses the PyDub Python wrapper (https://github.com/jiaaro/pydub) for interacting with ffmpeg. Here's the line you'll probably want to add functionality to:

https://github.com/nomadkaraoke/python-audio-separator/blob/main/audio_separator/separator/common_separator.py#L255

From their README it looks like there's a simple bitrate parameter for the export method, but you'll need to add a tiny bit of logic to only add that bitrate parameter for certain output formats (e.g. MP3 or other lossy formats) and make sure you only pass it in if it's a valid value for that format (e.g. only set it to 320k if it's MP3, some other lossy formats probably won't support that bitrate)

Hope that makes sense, and good luck! 🙇

empz commented 3 weeks ago

Thanks, I was able to find it and I'm already working on a PR. My plan is to add an optional argument output_mp3_bitrate that only gets applied if the output_format="MP3". Does that sound good enoufh to you?

EDIT: Also making 320k the default for mp3, unless you want to keep the current 128k as default.

beveradb commented 3 weeks ago

Great to hear!

Hmm, ideally I'd probably pass through output_bitrate as an additional class parameter and CLI parameter, alongside the existing output_format one, and only set the bitrate parameter on export() if that output_bitrate value is set.

That way you're leaving it up to the user to decide what they want ffmpeg to do (and that way if some user is using some less common lossy format like m4a, they can still try to specify bitrate themselves without you needing to test compatibility)

I do think defaulting to 320k for MP3 is a good idea though!

beveradb commented 3 weeks ago

(I hate lossy audio compression generally so would be happy for it to always default to the maximum possible bitrate for any given lossy format by the way, but I'm just thinking of what's the easiest to implement & most flexible for users which is why I'm not wild about an MP3-specific PR just to add output_mp3_bitrate)

I feel like a generic output_bitrate would be more versatile but I don't have super strong opinions here - if you raise a PR doing what you think is best and it adds functionality without breaking folks' existing workflows, I'm cool with it!

empz commented 3 weeks ago

(I hate lossy audio compression generally so would be happy for it to always default to the maximum possible bitrate for any given lossy format by the way, but I'm just thinking of what's the easiest to implement & most flexible for users which is why I'm not wild about an MP3-specific PR just to add output_mp3_bitrate)

I feel like a generic output_bitrate would be more versatile but I don't have super strong opinions here - if you raise a PR doing what you think is best and it adds functionality without breaking folks' existing workflows, I'm cool with it!

https://github.com/nomadkaraoke/python-audio-separator/pull/104

empz commented 14 hours ago

https://github.com/nomadkaraoke/python-audio-separator/pull/104