yukara-ikemiya / friendly-stable-audio-tools

Refactored / updated version of `stable-audio-tools` which is an open-source code for audio/music generative models originally by Stability AI.
MIT License
147 stars 11 forks source link

melody conditioning #2

Closed betterftr closed 5 months ago

betterftr commented 5 months ago

Hello!

Is there a way to train/inference using a melody input on top of the text input, like on the Stable Audio 2.0 website?

Thanks!

yukara-ikemiya commented 5 months ago

Hi, is melody input referring to demos like this one? https://stability-ai.github.io/stable-audio-2-demo/#additional-creative-capabilities

betterftr commented 5 months ago

yeap

yukara-ikemiya commented 5 months ago

The answer to your question is partially yes.

[Training part] First of all, StableAudio2.0 is NOT trained with melody conditioning. The only conditions used for it are text-prompt and time-information.

[Inference part] The reason why we can perform melody conditioning during inference is that by initializing the noise in generation process using a reference audio signal, we generate audio that is close to the reference. The reference input is supported by init_audio argument which can be used only from Gradio interface for now. https://github.com/yukara-ikemiya/friendly-stable-audio-tools/blob/c8e95d2aa017a3b7630270a4ff5841dbd7533917/stable_audio_tools/inference/generation.py#L106

In conclusion, although Stable Audio does not explicitly perform melody conditioning, it achieves a similar outcome by generating audio signals that resemble the reference.