xinjli / allosaurus

Allosaurus is a pretrained universal phone recognizer for more than 2000 languages
GNU General Public License v3.0
571 stars 88 forks source link

Content of fine-tuning files? #74

Open MarsMV opened 1 year ago

MarsMV commented 1 year ago

I find the language a little bit unclear when describing the contents of the text and audio files used for fine-tuning.

First off, what are acceptable contents of an utterance file? Total silence? Only one phoneme? Only one word? A whole sentence?

Secondly, should the comma-separated list in the text file have timestamps? Or is it just a chronological list of phonemes in the associated wave file? Should this list have duplicates if the wave file has the same phoneme show up several times? CAN a wave file have the same phoneme show up several times?

Any answers appreciated!