Open Coolroo opened 1 year ago
I've ran into the same issue. I've messed around with some minor fundamental things (--voice=voice vs --voice voice) yet i'm still getting three different voices as the output. The text used is always correct however, so that's been neat.
I believe ive figured it out!
/home/christian/Documents/tortoise-tts/tortoise/utils/audio.py:18: WavFileWarning: Chunk (non-data) not understood, skipping it.
was spitting out when I ran do_tts.py, and I initially ignored it. Turns out, audacity was adding headers/metadata that was throwing python off. Re-exporting the .wav files and clearing any metadata did the trick, and my custom voices are working.
I experience the same issue, except I've explicitly cleared the metadata from the relevant .wav files and the issue remains.
I have 10 clips, ranging from 9s - 31s in length. All 22050Hz, 32-bit float .wav format with wiped metadata.
The issue seems to be the same for all 10 clips. I have some other clips from a different original source that do not run into the same problem.
Does anyone know how I can ensure my clips satisfy scipy.wav.io
? The docs do not seem to contain the answer I seek
Just a shot in the dark, but I know the docs mention being tested with 5 clips as the source. Have you tried running it with only 3-5 clips in the custom voice directory?
Does a similar error to my post above print out at the very beginning of your run?
If there's not a similar error, id recommend adding some print()'s to uitils/audio.py and do_tty.py. The custom voice ingestion is fairly simple to follow in the code, add some prints checking whats being imported and make sure everything is working as intended. Using a custom voice that is working to double check, this is how I figured out the metadata malarkey.
Hello, I've tried creating a new voice, I've put 3 wav files in a subdirectory of tortoise/voices, but whenever I run
python3 tortoise/do_tts.py --text "testing" --voice *nameOfVoice* --preset fast
it always uses random voices. I do not get any errors, but the clips I've provided are of a man, and I am getting a womans voice in response