shivammehta25 / Matcha-TTS

[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
https://shivammehta25.github.io/Matcha-TTS/
MIT License
747 stars 95 forks source link

Dev #74

Closed shivammehta25 closed 6 months ago

shivammehta25 commented 6 months ago

What does this PR do?

Possibility to extract alignments out of Matcha-TTS

If the dataset is structured as

data/
└── LJSpeech-1.1
    ├── metadata.csv
    ├── README
    ├── test.txt
    ├── train.txt
    ├── val.txt
    └── wavs

Then you can extract the phoneme level alignments from a Trained Matcha-TTS model using:

python  matcha/utils/get_durations_from_trained_model.py -i dataset_yaml -c <checkpoint>

Example:

python  matcha/utils/get_durations_from_trained_model.py -i ljspeech.yaml -c matcha_ljspeech.ckpt

or simply!

matcha-tts-get-durations -i ljspeech.yaml -c matcha_ljspeech.ckpt

Possibility to train Matcha-TTS from the extracted alignments as mentioned in https://github.com/shivammehta25/Matcha-TTS/issues/73#issuecomment-2117429867

In the datasetconfig turn on load duration. Example: ljspeech.yaml

load_durations: True

or see an examples in configs/experiment/ljspeech_from_durations.yaml