Closed shivammehta25 closed 6 months ago
If the dataset is structured as
data/ └── LJSpeech-1.1 ├── metadata.csv ├── README ├── test.txt ├── train.txt ├── val.txt └── wavs
Then you can extract the phoneme level alignments from a Trained Matcha-TTS model using:
python matcha/utils/get_durations_from_trained_model.py -i dataset_yaml -c <checkpoint>
Example:
python matcha/utils/get_durations_from_trained_model.py -i ljspeech.yaml -c matcha_ljspeech.ckpt
or simply!
matcha-tts-get-durations -i ljspeech.yaml -c matcha_ljspeech.ckpt
In the datasetconfig turn on load duration. Example: ljspeech.yaml
ljspeech.yaml
load_durations: True
or see an examples in configs/experiment/ljspeech_from_durations.yaml
What does this PR do?
Possibility to extract alignments out of Matcha-TTS
If the dataset is structured as
Then you can extract the phoneme level alignments from a Trained Matcha-TTS model using:
Example:
or simply!
Possibility to train Matcha-TTS from the extracted alignments as mentioned in https://github.com/shivammehta25/Matcha-TTS/issues/73#issuecomment-2117429867
In the datasetconfig turn on load duration. Example:
ljspeech.yaml
or see an examples in configs/experiment/ljspeech_from_durations.yaml