Closed blueblue-bubble closed 1 year ago
What I mean is if the Fast Text2Unit Model is the HMM model under Kaldi recipe, and it used for decoding the unpaired speech and get the aligned phonemes from the lattice.
Hi @blueblue-bubble, the Fast Text2Unit Model is used for text-to-hidden unit transformation, it is modified from Fastspeech (a non-autoregressive tts model). The Fast Text2Unit Model is the so-called "Hidden-unit tokenizer for text" in the paper (see appendix).
Note that the kaldi HMM model is not provided in this repo, you can follow the kaldi recipe.
Hello,thanks for your great work.However, I want to ask you some question. I notice that there is a model namedFast Text2Unit Model in the item SpeechLM, but I didn't find the usage about the model. I want to know if the model is used for transforem the text which is transformed from speech to units?