zkenshin / godot-ai-toolkit

An interaction layer between Godot and AI apis/tools to accelerate developing games with AI. Our entry for the 2022 Assembly AI Hackathon.
GNU General Public License v3.0
2 stars 0 forks source link

regarding future Uberduck integration #1

Open ev3ntHandler opened 2 weeks ago

ev3ntHandler commented 2 weeks ago

Hello! I was wondering if this is still being worked on? I'm looking into being able to pass text into uberduck api and receive back the tts audio. If this project is no longer supported, would you mind leaving a few hints for me on how to integrate uberduck into godot?

zkenshin commented 2 weeks ago

Hello @ev3ntHandler!

Thank you for you your interest in our plugin. We currently haven't been working on this project.

Let me take a few days to re-acquant myself with the project to better identify how you would go about adding in or supporting Uberduck.

In the meantime, some quick resources would be the Godot HTTP request system is what we used to actually make the HTTP requests: https://docs.godotengine.org/en/stable/tutorials/networking/http_request_class.html

We also use the JSON class for organizing/sending data and parsing the returned values: https://docs.godotengine.org/en/stable/classes/class_json.html

And you can see how those are used in https://github.com/zkenshin/godot-ai-toolkit/blob/main/godot-ai-toolkit/addons/ai_toolkit/scripts/openai_api.gd

It looks like Uberduck as their API reference for generating TTS here: https://docs.uberduck.ai/reference/generate-speech

I don't currently know how Godot handles being sent back something like audio though.

ev3ntHandler commented 2 weeks ago

Thanks for replying! The problem part for me is finding out if the API is completely free of charge without any drawbacks such as a free trial that would last a few days and then flop on itself, as well as communicating with uberduck's API. Taking a peek at their API it looks like I'll have to send a http request and get the output (some sort of raw audio data?/wav mp3 ogg file?/something else?), figure out how to deal with it and then paste it in an audioplayer's stream. I'm aiming to create an ambitious yet kind of a stupid project where I have multiple AI's talking to each other (if you've seen AI generated spongebob episodes, its similar to that but with a few additions of my own). I can deal with the dialogue part, however generating speech is more on the difficult side of things for me now... I'll have to make sure the way I receive the speech audio is relatively in real time ~3-4 seconds max delay, or if that doesn't really work, i'll have to download all (or partial, with loading during scene playback) the audio from uberduck and give it a nice loading screen before it begins the playback. I hope I'm not troubling you too much, if anything, take your time!

zkenshin commented 1 week ago

Ah, I don't think Uberduck offers a perpetual free pricing option. You can see there pricing here: https://www.uberduck.ai/pricing

I doubt any of the main provides of text to speech APIs will provide a perpetually free option unfortunately.

If that is a requirement, you may be better served by investigating running a text to speech that runs locally though it will likely require a decent GPU for fast audio creation.

I've heard of some like https://github.com/neonbjb/tortoise-tts but have no experience with any of that.

And you should probably proof of concept/test out a system like that on its own to ensure it meets your needs, before trying to integrate it with godot.

If you stick with an API then perhaps try it out with something like https://insomnia.rest/ first to figure out what you need to do with it.

Hopefully that helps you have some directions to look with it!