okulovsky / kaia

GNU General Public License v3.0
17 stars 3 forks source link

Improvement of voice training #4

Open okulovsky opened 1 week ago

okulovsky commented 1 week ago

Ideas/links/reviews on everything related to the voice training

okulovsky commented 1 week ago

StyleTTS may replace TortoiseTTS.

https://github.com/yl4579/StyleTTS2 https://huggingface.co/spaces/styletts2/styletts2

The voice quality is very good, it's less resource-intense and more stable than TortoiseTTS.

It also supports emotions, so voice's samples can be generated from different emotions and then VITS would train on them as if on different voices.

To proceed, integration of StyleTTS into BrainBox is needed

okulovsky commented 1 week ago

To clean up voices from imperfect sources, this might be used https://huggingface.co/spaces/ResembleAI/resemble-enhance

okulovsky commented 1 week ago

To train VITS model of a character in another language:

  1. https://github.com/rhasspy/piper has several VITS models for different languages and the recepy for training
  2. There is no known tool for upsampling (TortoiseTTS/StyleTTS analogue for German/Russian)
  3. As for voice transfer: the problem is to generate some voice samples on language X, having only english samples. Since the amount is really small, anything would work, including paid solutions.
    • Elevenlabs do not really capture voice peculiarities.
    • OpenVoice captures tone https://github.com/myshell-ai/OpenVoice , but not other things such as tempo etc. Maybe a solution could be to reproduce the voices manually (i.e. with own mouth) and then use OpenVoice to improve tone. Integration of OpenVoice to BrainBox is needed.