Open peanut1101 opened 2 years ago
First of all, you need to collect audio data in wav/mp3 format and then use this or analog for annotation your audios.
2 step – export your annotations as csv
file and .zip
of wavs(unzip this).
3 step – create configs for you data and run python3 prepare_align.py config/LJSpeech/preprocess.yaml
.
This create raw data dir
with .wav
s and .lab
files, which contain text from corresponding wav.
4 step – install MFA or another aligner and:
create pronunciation dictionary for your data using mfa or another tool
train mfa aligner
example: mfa train raw_data/your_dataset/ lexicon/your_dataset.txt out_dir/aligner_model.zip preprocessed_data/your_dataset/TextGrid
5 step – run python3 preprocess.py config/LJSpeech/preprocess.yaml
. It create preprocessed_data
dir with data for training.
6 step – run model training
Thanks a lot for your contribution, @ruslantau i have a question, in order to train i need als speakers.json (this is ok is just the mapping to id of each speaker in the dataset) stats.json, how to compute this? moreover i need the new list with transcription (val.txt and train.txt) how to generate those, i already have the lists for train and val but i have the text not codified in the phonemas extracted by the mfa, how can i do it?
Hi @alessandropec,
Have you find a ways to make stas.json
?
How to create dataset?