Open crypticsymmetry opened 1 year ago
Hi,
1) doesn't matter what file type it is.
2) you would have to add the the missing characters/ phonemes used in your dataset to this file: word_index_dict_new.txt
or your own file and reference it in the code.
https://github.com/lexkoro/StyleTTS/blob/243b54ff398cc82f97b2526715aec048d61d1b85/AuxiliaryASR/meldataset.py#L29
The log is printing all characters it doesn't find in the word_index_dict_new.txt file. You can check for this. https://github.com/lexkoro/StyleTTS/blob/243b54ff398cc82f97b2526715aec048d61d1b85/AuxiliaryASR/text_utils.py#L9
Also I remove the phonemization step since I have precomputed it directly into the metadata file:
/SqNarrator/wavs/a0jm2r00.171.wav|SqNarrator_EN|5|1|jɛp, teɪsts d͡ʒʌst laɪ̯k jʌd ɛkspɛkt.|yep, tastes just like you'd expect.
/SqNarrator/wavs/a0ji2r00.0z1.wav|SqNarrator_EN|5|1|ju siː sʌm dɹɪpɪŋ, uzɪŋ stʌf.|you see some dripping, oozing stuff.
So you might wanna add it back.
I am trying to train this repo on LibriTTS dataset. Starting with ASR training. question 1: is the data formatting the same "path|transcription|speaker#"? also i see in your config you are now using csv's, do i have to convert to a csv as well. question 2: does this training log look correct? the text that it prints doesnt make any sense.
I changed the code to use a single string as train_data and val_data instead of a list. config.yml:
meldataset.py:
utils.py:
Example dataset format "train_data_test.txt".
training logs.