using --voice field in do_tts.py always yeilds random

neonbjb / tortoise-tts

A multi-voice TTS system trained with an emphasis on quality

Apache License 2.0

12.94k stars 1.78k forks source link

using --voice field in do_tts.py always yeilds random #388

Open Coolroo opened 1 year ago

Coolroo commented 1 year ago

Hello, I've tried creating a new voice, I've put 3 wav files in a subdirectory of tortoise/voices, but whenever I run python3 tortoise/do_tts.py --text "testing" --voice *nameOfVoice* --preset fast it always uses random voices. I do not get any errors, but the clips I've provided are of a man, and I am getting a womans voice in response

heffnercr commented 1 year ago

I've ran into the same issue. I've messed around with some minor fundamental things (--voice=voice vs --voice voice) yet i'm still getting three different voices as the output. The text used is always correct however, so that's been neat.

heffnercr commented 1 year ago

I believe ive figured it out!

/home/christian/Documents/tortoise-tts/tortoise/utils/audio.py:18: WavFileWarning: Chunk (non-data) not understood, skipping it.

was spitting out when I ran do_tts.py, and I initially ignored it. Turns out, audacity was adding headers/metadata that was throwing python off. Re-exporting the .wav files and clearing any metadata did the trick, and my custom voices are working.

sgb-io commented 1 year ago

I experience the same issue, except I've explicitly cleared the metadata from the relevant .wav files and the issue remains.

I have 10 clips, ranging from 9s - 31s in length. All 22050Hz, 32-bit float .wav format with wiped metadata.

The issue seems to be the same for all 10 clips. I have some other clips from a different original source that do not run into the same problem.

Does anyone know how I can ensure my clips satisfy scipy.wav.io? The docs do not seem to contain the answer I seek

heffnercr commented 1 year ago

Just a shot in the dark, but I know the docs mention being tested with 5 clips as the source. Have you tried running it with only 3-5 clips in the custom voice directory?
Does a similar error to my post above print out at the very beginning of your run?

If there's not a similar error, id recommend adding some print()'s to uitils/audio.py and do_tty.py. The custom voice ingestion is fairly simple to follow in the code, add some prints checking whats being imported and make sure everything is working as intended. Using a custom voice that is working to double check, this is how I figured out the metadata malarkey.