Open mykeehu opened 2 weeks ago
Hi, thanks for requesting this! I have been procrastinating with it actually. One question - such a model would require a huggingface account and a login to be used, since this https://huggingface.co/stabilityai/stable-audio-open-1.0 cannot be automatically downloaded. Would you be ok with that?
Please respond as this is a matter that could really determine whether or not people use it.
I don't have a problem downloading the model this way, maybe you could ask for the login to download it? So those who have it can use it, those who don't can't. I don't know why it's tied to a license, but I've seen a video of it making quite good sound effects, so after the login the model would be downloaded.
I'd be interested in trying this out too, please.
Hi, thanks for requesting this! I have been procrastinating with it actually. One question - such a model would require a huggingface account and a login to be used, since this https://huggingface.co/stabilityai/stable-audio-open-1.0 cannot be automatically downloaded. Would you be ok with that?
Please respond as this is a matter that could really determine whether or not people use it.
For instance, I'm ok with it. Thanks!
a hearty same from I
On Thu, Jun 20, 2024 at 10:31 AM Christopher Lowden < @.***> wrote:
I'd be interested in trying this out too, please.
— Reply to this email directly, view it on GitHub https://github.com/rsxdalv/tts-generation-webui/issues/319#issuecomment-2180858641, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMCXCYQLNAQEDMUOODLQOWDZILRS5AVCNFSM6AAAAABJCW7LPOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBQHA2TQNRUGE . You are receiving this because you are subscribed to this thread.Message ID: @.***>
I've already downloaded the checkpoint. I presume that those who are enjoying your interface are the sort of people who already have a huggingface account.
Stable audio has been added but is causing some problems so it might be added-removed a few times until it's 'stable'.
Also, I just want to clarify - with extensive research - stable audio is not a 'stable diffusion 1.5' moment because it has a restrictive, potentially dangerous license (which might be legally unenforceable or impossible to defend in court; it's the very same infamous SD3 license) and I saw comments about Facebook's (notably similarly non-commercially licensed) AudioGen/MusicGen performing similarly.
My biggest issue so far is that running the 'official' inference code results in ~14gb RAM usage, where due to memory management my 24 gb RAM & 24 gb VRAM system would often just fail.
That being said, I really appreciate receiving information about what people want to try and see.
I concur on the VRAM issue. I often saturate my RTX 3090 with 24GB of RAM using MusicGen. I have not been able to test MultiBandDiffusion due to VRAM saturation. I have seen that python will not release the VRAM it takes up so it blocks the GPU. I have to restart the machine to liberate the VRAM. If Stable Audio is even worse than MusicGen, it does make it probematic to test for me.
Restarting the webui should be enough. Additionally, after I fix the bugs arising from adding this new model, I can spend more time on 'unload model' buttons throughout the UI; however, there will always be some leftovers that aren't unloaded. As for Stable Audio - generating a 47 second or a 1 second clip seems to use the same amount of VRAM unless they somehow can fix it all will do it themselves. Honestly there's multiple improvements on the model itself that are waiting to be done by somebody, perhaps they are hoping the community will do it.
And as we are talking of other models ... maybe people are interested in ... Toucan TTS with 7000 languages https://github.com/DigitalPhonetics/IMS-Toucan
And as we are talking of other models ... maybe people are interested in ... Toucan TTS with 7000 languages https://github.com/DigitalPhonetics/IMS-Toucan
For this project it seems decent but could be hard to handle if it means everyone has to install espeak.
Please add Stable Audio to the options, if you please! Thank you very much in advance!
https://github.com/Stability-AI/stable-audio-tools
And model here: https://huggingface.co/stabilityai/stable-audio-open-1.0