rsxdalv / tts-generation-webui

TTS Generation Web UI (Bark, MusicGen + AudioGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, MAGNet, StyleTTS2, MMS)
https://rsxdalv.github.io/tts-generation-webui/
MIT License
1.46k stars 160 forks source link

Feature request : Add batch generation to MusicGen #278

Closed Aamir3d closed 1 week ago

Aamir3d commented 4 months ago

Issue: Currently, we have to click generate every time we want to generate a new sound in the Musicgen tab (for all the Facebook models). It gets slightly tedious when one wants to generate different sounds and select a sound that is good enough to be used.

Feature request: It would be good to have a "Batch number" where one can select the number of generations that are done by the application. For example, one can select "5" and have the system output 5 tunes.

image

Additional good to have: Shortcut Key "Ctrl+Enter" to start generation (like in A1111 and Fooocus and other applications)

rsxdalv commented 4 months ago

Is it just sequential batching (i.e., do 1-1-1-1-1 automatically then give 5) or are you looking for a 5-at-once scenario? Also do you want to just have the same prompt like A-A-A-A-A or rather have the ability to give many prompts at once with each line like:

water
water 320Kbps
loud water 320Kbps
quiet water
water stream
Aamir3d commented 3 months ago

You have two great ideas here! The ideal would be to start a batch, and have it finish and show the generations below in a row format (like in your outputs section). In this case - Audio1-Audio2-Audio3.....Audio7 would work (Giving 7 files with random seeds, but with the same parameters).

The idea for multiple prompts is even better! One could do variations for each batch and choose the best output.

And while we're discussing this, this same approach could potentially be applied to the first screen where you have several buttons that say "Generate 1" "Generate 2" etc. Simplifying the interface to specify number of audio files and a single generate button will make it more consistent.

rsxdalv commented 3 months ago

You have two great ideas here! The ideal would be to start a batch, and have it finish and show the generations below in a row format (like in your outputs section). In this case - Audio1-Audio2-Audio3.....Audio7 would work (Giving 7 files with random seeds, but with the same parameters).

The idea for multiple prompts is even better! One could do variations for each batch and choose the best output.

And while we're discussing this, this same approach could potentially be applied to the first screen where you have several buttons that say "Generate 1" "Generate 2" etc. Simplifying the interface to specify number of audio files and a single generate button will make it more consistent.

That sounds good! I am thinking about doing it for the React UI since it's a lot easier than the gradio. Would you be ok with using the React UI, at least for this use case?

Aamir3d commented 3 months ago

From the end user's perspective, as long as there's a good GUI, the back end development should not be an issue. Whatever is easier for you and I know you're improving the GUI constantly.

rsxdalv commented 3 months ago

Hopefully this resolves it: https://github.com/rsxdalv/tts-generation-webui/pull/281

Aamir3d commented 3 months ago

Hopefully this resolves it: #281

Thank you - I'm looking to test this out over the weekend! Will share updates.