Help with using this tool for creating TTS training data

k2-fsa / text_search

Some fast-ish algorithms for batch text search in moderate-sized collections, intended for data cleanup

https://k2-fsa.github.io/text_search/

53 stars 14 forks source link

Help with using this tool for creating TTS training data #69

Open weedwind opened 1 month ago

weedwind commented 1 month ago

Hi,

Thank you very much for building this tool. I want to use it to segment/align libri-light for training TTS. I am new to this tool. Can anyone help me with the following questions:

If I want to segment the books to about 10 sec chunks (rather than 30), what hyperparameters I should change?
In the output, there are two sets of texts, lowercase with punctuations, and uppercase without punctuations, which one should I use as the ground truth for training TTS?

Thank you so much for any help.

pkufool commented 1 month ago

If I want to segment the books to about 10 sec chunks (rather than 30), what hyperparameters I should change?

https://github.com/k2-fsa/text_search/blob/7c452edc942d24ad23ca315f430a24d3d71a30e4/examples/libriheavy/matching.py#L101-L103

In the output, there are two sets of texts, lowercase with punctuations, and uppercase without punctuations, which one should I use as the ground truth for training TTS?

It's up to you, I will suggest to use texts with punctuations.

weedwind commented 1 month ago

@pkufool Thank you so much. From a quick look at the documentation, it looks to me that the texts with punctuations are the reference, and the uppercase ones are the output from ASR. I am wondering is the uppercase text equally accurate as the reference, if I want to use them to train TTS?

pkufool commented 1 month ago

No. If you don't want the punctuations, you can remove them and convert the punctuation texts to uppercase, it is not a good idea to use the ASR transcrptions to train TTS.