ming024 / FastSpeech2

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"
MIT License
1.77k stars 527 forks source link

Making dataset #166

Open peanut1101 opened 2 years ago

peanut1101 commented 2 years ago

How to create dataset?

ruslantau commented 2 years ago

First of all, you need to collect audio data in wav/mp3 format and then use this or analog for annotation your audios. 2 step – export your annotations as csv file and .zip of wavs(unzip this). 3 step – create configs for you data and run python3 prepare_align.py config/LJSpeech/preprocess.yaml. This create raw data dir with .wavs and .lab files, which contain text from corresponding wav. 4 step – install MFA or another aligner and:

5 step – run python3 preprocess.py config/LJSpeech/preprocess.yaml. It create preprocessed_data dir with data for training. 6 step – run model training

alessandropec commented 1 year ago

Thanks a lot for your contribution, @ruslantau i have a question, in order to train i need als speakers.json (this is ok is just the mapping to id of each speaker in the dataset) stats.json, how to compute this? moreover i need the new list with transcription (val.txt and train.txt) how to generate those, i already have the lists for train and val but i have the text not codified in the phonemas extracted by the mfa, how can i do it?

phamkhactu commented 1 year ago

Hi @alessandropec,

Have you find a ways to make stas.json ?

djanibekov commented 8 months ago

@phamkhactu read this part https://github.com/ming024/FastSpeech2/blob/d4e79eb52e8b01d24703b2dfc0385544092958f3/preprocessor/preprocessor.py#L116