What does this PR do?

Possibility to extract alignments out of Matcha-TTS

If the dataset is structured as

data/
└── LJSpeech-1.1
    ├── metadata.csv
    ├── README
    ├── test.txt
    ├── train.txt
    ├── val.txt
    └── wavs

Then you can extract the phoneme level alignments from a Trained Matcha-TTS model using:

python  matcha/utils/get_durations_from_trained_model.py -i dataset_yaml -c <checkpoint>

Example:

python  matcha/utils/get_durations_from_trained_model.py -i ljspeech.yaml -c matcha_ljspeech.ckpt

or simply!

matcha-tts-get-durations -i ljspeech.yaml -c matcha_ljspeech.ckpt

In the datasetconfig turn on load duration. Example: ljspeech.yaml

load_durations: True

or see an examples in configs/experiment/ljspeech_from_durations.yaml