Open Franck-Dernoncourt opened 1 year ago
This is my question as well. Should we re-train or tune the model?
Rewriting the text is a good solution without training the model. I'm guessing you could even ask chatgpt to spell out acronyms with letter sounds in a system prompt, that's how I'd try to tackle this.
Training larger models or for longer also seems to solve this for TTS. This isn't likely something that you could fine tune in without losing quality. I could imagine that the model might start pronouncing all words as acronyms for example. It needs to understand context to do this right and that'd only come with pretraining.
I'm trying to get tortoise-tts to pronounce acronyms correctly. Example of text that I'd like tortoise-tts to generate an audio file for:
OpenAI ChatGPT is a new language model
.The audio file generated by tortoise-tts is:
OpenAI Chat is a new language model
(GPT is missing from the audio file).I can replace
ChatGPT
withChat gee pee tee
but I've had a case where thejee pee tee
changes the tone of the audio file.Questions:
ChatGPT
withChat gee pee tee
the most optimal solution?