Closed betterftr closed 5 months ago
Hi, is melody input
referring to demos like this one?
https://stability-ai.github.io/stable-audio-2-demo/#additional-creative-capabilities
yeap
The answer to your question is partially yes.
[Training part] First of all, StableAudio2.0 is NOT trained with melody conditioning. The only conditions used for it are text-prompt and time-information.
[Inference part]
The reason why we can perform melody conditioning during inference is that by initializing the noise in generation process using a reference audio signal, we generate audio that is close to the reference.
The reference input is supported by init_audio
argument which can be used only from Gradio interface for now.
https://github.com/yukara-ikemiya/friendly-stable-audio-tools/blob/c8e95d2aa017a3b7630270a4ff5841dbd7533917/stable_audio_tools/inference/generation.py#L106
In conclusion, although Stable Audio does not explicitly perform melody conditioning, it achieves a similar outcome by generating audio signals that resemble the reference.
Hello!
Is there a way to train/inference using a melody input on top of the text input, like on the Stable Audio 2.0 website?
Thanks!