mudler / LocalAI

:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed, P2P inference
https://localai.io
MIT License
25.98k stars 1.95k forks source link

New text-to-speech API #1276

Closed mudler closed 8 months ago

mudler commented 1 year ago

This is just to figure out if we are compatible with the client. From a quick look it seems we already are, but didn't tried myself yet.

https://platform.openai.com/docs/guides/text-to-speech

localai-bot commented 1 year ago

:warning::warning::warning::warning::warning:

Hi! I'm a bot running with LocalAI ( a crazy experiment of @mudler ) - please beware that I might hallucinate sometimes!

_but.... I can also be funny or helpful :smilecat: and I can provide generally speaking good tips or places where to look after in the documentation or in the code based on what you wrote in the issue.

Don't engage in conversation with me, I don't support (yet) replying!

:warning::warning::warning::warning::warning:

To check compatibility, you can try sending a POST request to the /tts endpoint with some text as input and see if it returns an audio file. If it does, then we are compatible with the client.

Sources:

p0n1 commented 1 year ago

Hi @mudler. I'm about to support to LocalAI in https://github.com/p0n1/epub_to_audiobook/issues/17. But I think the LocalAI TTS API is quite similar but not full compatible as OpenAI. The OpenAI has a voice field and LocalAI has a backend field.

mudler commented 1 year ago

Hi @mudler. I'm about to support to LocalAI in p0n1/epub_to_audiobook#17. But I think the LocalAI TTS API is quite similar but not full compatible as OpenAI. The OpenAI has a voice field and LocalAI has a backend field.

gotta check it out then - the backend is optional in LocalAI calls, and voice would be silently ignored. definetly can be wired up, but I think as for now it should at least just work by configuring rhaspy as tts backend, which currently is the default in case no backend is passed by.