neonbjb / tortoise-tts

A multi-voice TTS system trained with an emphasis on quality
Apache License 2.0
13.23k stars 1.83k forks source link

How can I get tortoise-tts to pronounce acronyms correctly? #392

Open Franck-Dernoncourt opened 1 year ago

Franck-Dernoncourt commented 1 year ago

I'm trying to get tortoise-tts to pronounce acronyms correctly. Example of text that I'd like tortoise-tts to generate an audio file for: OpenAI ChatGPT is a new language model.

The audio file generated by tortoise-tts is: OpenAI Chat is a new language model (GPT is missing from the audio file).

I can replace ChatGPT with Chat gee pee tee but I've had a case where the jee pee tee changes the tone of the audio file.

Questions:

  1. Is replacing ChatGPT with Chat gee pee tee the most optimal solution?
  2. If it is the most optimal solution, is there any convenient script to replace acronyms with their pronounced version (GPT->jee pee tee)?
  3. If it is not the most optimal solution, what is the most optimal solution?
Andiami-Yusaka commented 1 year ago

This is my question as well. Should we re-train or tune the model?

neonbjb commented 1 year ago

Rewriting the text is a good solution without training the model. I'm guessing you could even ask chatgpt to spell out acronyms with letter sounds in a system prompt, that's how I'd try to tackle this.

Training larger models or for longer also seems to solve this for TTS. This isn't likely something that you could fine tune in without losing quality. I could imagine that the model might start pronouncing all words as acronyms for example. It needs to understand context to do this right and that'd only come with pretraining.