Open lvZic opened 2 years ago
你好,请问你说的支持具体是什么支持?
我是指音素的language里没有汉语普通话,如下:
Initials (consonants) - 21 phonemes
(b) (c) (d) (f) (g) (h) (j) (k) (l) (m) (n) (p) (q) (r) (s) (t) (x) (z) (zh) (ch) (sh)
Finals (vowels and vowel-nasal pairs) - 35 phonemes
(a) (e) (i) (o) (u) (ü) (iu) (ui) (un) (ün) (ia) (ie) (ua) (uo) (ai) (ei) (in) (ou) (an) (ao) (en) (ang) (ong) (eng) (ing) (ian) (iao) (uan) (uai) (iou) (üan) (iang) (iong) (uang) (ueng)
@lvZic , I apologize for writing in English since I can read, but not write, Chinese.
There appear to be two misunderstandings here:
allosaurus
supports Chinese (including 普通话) in the same way it supports every language: by recognizing acoustic speech signals as sequences of IPA (International Phonetic Alphabet) phones. It does not, strictly speaking, support phonemes and it does not directly support orthographies—such as Pinyin (拼音)—whether or not they are phonemically adequate.If you want to recognize 普通话 speech as 拼音, you have at least three options:
@lvZic , I apologize for writing in English since I can read, but not write, Chinese.
There appear to be two misunderstandings here:
allosaurus
supports Chinese (including 普通话) in the same way it supports every language: by recognizing acoustic speech signals as sequences of IPA (International Phonetic Alphabet) phones. It does not, strictly speaking, support phonemes and it does not directly support orthographies—such as Pinyin (拼音)—whether or not they are phonemically adequate.- You are confusing phonemes with syllabic constituents (initial/onset, final/rhyme, and tone). By definition, a phoneme is a minimal contrastive unit of sound and is anything but minimal (consisting of three segments) [j], [o], and [ŋ].
If you want to recognize 普通话 speech as 拼音, you have at least three options:
- Use a pronouncing dictionary of Chinese to transliterate a speech corpus into Pinyin, then train a standardard ASR model on the corpus.
- Train a model to transduce IPA to Pinyin and use it in a pipeline with Allosaurus: speech signal --allosaurus--> IPA --transducer--> Pinyin
- Use an off-the-shelf Chinese ASR model and convert the output (汉子) to Pinyin using a pronouncing dictionary: speech signal --Chinese ASR--> 汉子 --transducer--> 拼音 (Easiest).
thanks for your reply. I will have a look. And I wonder if allosaurus has enough accuracy, while I want to use it to generate phoneme dataset for animation lip training. I found there is a little difference between the resulting phonemes of eng_to_ipa method and allosaurus.
你好,请问你说的支持具体是什么支持?