matatonic / openedai-speech

An OpenAI API compatible text to speech server using Coqui AI's xtts_v2 and/or piper tts as the backend.
GNU Affero General Public License v3.0
362 stars 45 forks source link

Feature: xtts generation parameters like temperature, top_k, etc #22

Closed matatonic closed 3 months ago

matatonic commented 3 months ago

to be set on a per voice basis.

Proposed format:

sky:
  model: xtts
  speaker: voices/sky.wav
  generation_params:
    temperature: 0.7
    repetition_penalty: 5.0
matatonic commented 3 months ago

I'm adjusting this - just for simplicity, if anyone has any suggestions or comments please let me know.

Sample of new options:

sky:
  model: xtts
  speaker: voices/sky.wav
  enable_text_splitting: True
  length_penalty: 1.0
  repetition_penalty: 10
  speed: 1.0
  temperature: 0.75
  top_k: 50
  top_p: 0.85
  comment: You can add a comment here also, which will be persistent and otherwise ignored.
roman-ta27 commented 3 months ago

Hey, I'm really enjoying the project!! One quick question: In the README "Streamed output while generating" is mentioned for the tts-1-hd model. I can't seem to find a way to enable it. Could you point me in the right direction?

matatonic commented 3 months ago

Hey, I'm really enjoying the project!! One quick question: In the README "Streamed output while generating" is mentioned for the tts-1-hd model. I can't seem to find a way to enable it. Could you point me in the right direction?

It's not optional, it's always enabled, if you're not seeing streamed audio your client may not be playing as a stream.