rsxdalv / tts-generation-webui

TTS Generation Web UI (Bark, MusicGen + AudioGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, MAGNet, StyleTTS2, MMS, Stable Audio, Mars5, F5-TTS, ParlerTTS)
https://rsxdalv.github.io/tts-generation-webui/
MIT License
1.82k stars 199 forks source link

http query #28

Closed magokeanu closed 1 year ago

magokeanu commented 1 year ago

Hi guys, I love the project, this is not an issue just a question, How can I query it with http? There is any api flag or something? Thanks for taking the time :3

rsxdalv commented 1 year ago

Thanks for the request! Gradio automatically generates an API but it's worthwhile to understand what the exact use case is, i.e., if you are trying to use it as a TTS endpoint, how would you expect it to work?

I haven't explored it yet but there's probably a way to do: http request -> wav filename returned. Just with the current setup alone.

On Fri, Jun 9, 2023, 8:24 AM magokeanu @.***> wrote:

Hi guys, I love the project, this is not an issue just a question, How can I query it with http? There is any api flag or something? Thanks for taking the time :3

— Reply to this email directly, view it on GitHub https://github.com/rsxdalv/tts-generation-webui/issues/28, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABTRXI2U2AP7NZ36JFHWRDDXKKXRXANCNFSM6AAAAAAZAFAKXU . You are receiving this because you are subscribed to this thread.Message ID: @.***>

benedlore commented 1 year ago

@rsxdalv I'm also really impressed with the ease of use and bringing these models together in such a convenient way, but also need to access it via the API. It says there are 263 endpoints open by default with how it was automatically set up, but it would be helpful if the number of endpoints were reduced and made to make more sense, so we could also change all the appropriate optional settings as well for whichever model we are targeting.

Right now, it seems it would be difficult to be able to properly target each model and modify all the visible variables for it visible on the webui through the api.

Primarily, I'd be interested in being able to inference Bark, Tortoise, Musicgen, and Audiogen, and be able to supply each with their appropriate settings, with the final audio file returned as base64 or something. Sadly, I am not proficient enough in python to know how to edit this API to expose more of those settings and make a good endpoint for each main model.

rsxdalv commented 1 year ago

@rsxdalv I'm also really impressed with the ease of use and bringing these models together in such a convenient way, but also need to access it via the API. It says there are 263 endpoints open by default with how it was automatically set up, but it would be helpful if the number of endpoints were reduced and made to make more sense, so we could also change all the appropriate optional settings as well for whichever model we are targeting.

Right now, it seems it would be difficult to be able to properly target each model and modify all the visible variables for it visible on the webui through the api.

Primarily, I'd be interested in being able to inference Bark, Tortoise, Musicgen, and Audiogen, and be able to supply each with their appropriate settings, with the final audio file returned as base64 or something. Sadly, I am not proficient enough in python to know how to edit this API to expose more of those settings and make a good endpoint for each main model.

If I understand the auto generated endpoints correctly, there should be ones for inference that only need one command and you get audio or a filename as output. If I may ask - what are you building? A webserver for hosted inference, or something else?

rsxdalv commented 1 year ago

Here's what could be the usage of MusicGen via gradio API:

import { client } from "@gradio/client";

async function run() {

    const response_0 = await fetch("https://github.com/gradio-app/gradio/raw/main/test/test_files/audio_sample.wav");
    const exampleAudio = await response_0.blob();

    const app = await client("http://localhost:7865/");
    const result = await app.predict("/MusicGen", [     
                "null", // any (any valid json) in 'parameter_176' Json component
                exampleAudio,   // blob in 'Melody (optional)' Audio component
    ]);

    console.log(result?.data);
}

run();
benedlore commented 1 year ago

What I would be interested is running it on a server, but hosting these services not for a webpage, but for a slack/discord bot, so users can query it to generate audio using a variety of models like you have conveniently set up here. I see that endpoint in the documentation, but I am confused on how to pass more parameters to it. For example with MusicGen, it would be nice to expose optional parameters to users like top-k, top-p, Temp, duration, etc. And likewise for Bark, it would be nice to expose Text Temp and Waveform Temp as a couple examples. I am not sure how to do that, is that possible by including a bunch of json data with the way the API is currently set up (perhaps the _"any (any valid json) in 'parameter176' Json component" parameter)?

rsxdalv commented 1 year ago

Yes, that should be the JSON. I was trying to get it to work but ran into some gradio library issue. I'm one step closer to abandoning gradio.