请问没有中文语言的支持吗

xinjli commented 2 years ago

你好，请问你说的支持具体是什么支持？

lvZic commented 2 years ago

你好，请问你说的支持具体是什么支持？

我是指音素的language里没有汉语普通话，如下：

Initials (consonants) - 21 phonemes

(b) (c) (d) (f) (g) (h) (j) (k) (l) (m) (n) (p) (q) (r) (s) (t) (x) (z) (zh) (ch) (sh)

Finals (vowels and vowel-nasal pairs) - 35 phonemes

(a) (e) (i) (o) (u) (ü) (iu) (ui) (un) (ün) (ia) (ie) (ua) (uo) (ai) (ei) (in) (ou) (an) (ao) (en) (ang) (ong) (eng) (ing) (ian) (iao) (uan) (uai) (iou) (üan) (iang) (iong) (uang) (ueng)

dmort27 commented 2 years ago

@lvZic , I apologize for writing in English since I can read, but not write, Chinese.

There appear to be two misunderstandings here:

allosaurus supports Chinese (including 普通话) in the same way it supports every language: by recognizing acoustic speech signals as sequences of IPA (International Phonetic Alphabet) phones. It does not, strictly speaking, support phonemes and it does not directly support orthographies—such as Pinyin (拼音)—whether or not they are phonemically adequate.
You are confusing phonemes with syllabic constituents (initial/onset, final/rhyme, and tone). By definition, a phoneme is a minimal contrastive unit of sound and is anything but minimal (consisting of three segments) [j], [o], and [ŋ].

If you want to recognize 普通话 speech as 拼音, you have at least three options:

Use a pronouncing dictionary of Chinese to transliterate a speech corpus into Pinyin, then train a standardard ASR model on the corpus.
Train a model to transduce IPA to Pinyin and use it in a pipeline with Allosaurus: speech signal --allosaurus--> IPA --transducer--> Pinyin
Use an off-the-shelf Chinese ASR model and convert the output (汉子) to Pinyin using a pronouncing dictionary: speech signal --Chinese ASR--> 汉子 --transducer--> 拼音 (Easiest).

lvZic commented 2 years ago

@lvZic , I apologize for writing in English since I can read, but not write, Chinese.

There appear to be two misunderstandings here:

allosaurus supports Chinese (including 普通话) in the same way it supports every language: by recognizing acoustic speech signals as sequences of IPA (International Phonetic Alphabet) phones. It does not, strictly speaking, support phonemes and it does not directly support orthographies—such as Pinyin (拼音)—whether or not they are phonemically adequate.

You are confusing phonemes with syllabic constituents (initial/onset, final/rhyme, and tone). By definition, a phoneme is a minimal contrastive unit of sound and is anything but minimal (consisting of three segments) [j], [o], and [ŋ].

If you want to recognize 普通话 speech as 拼音, you have at least three options:

Use a pronouncing dictionary of Chinese to transliterate a speech corpus into Pinyin, then train a standardard ASR model on the corpus.

Train a model to transduce IPA to Pinyin and use it in a pipeline with Allosaurus: speech signal --allosaurus--> IPA --transducer--> Pinyin

Use an off-the-shelf Chinese ASR model and convert the output (汉子) to Pinyin using a pronouncing dictionary: speech signal --Chinese ASR--> 汉子 --transducer--> 拼音 (Easiest).

thanks for your reply. I will have a look. And I wonder if allosaurus has enough accuracy, while I want to use it to generate phoneme dataset for animation lip training. I found there is a little difference between the resulting phonemes of eng_to_ipa method and allosaurus.

xinjli / allosaurus

请问没有中文语言的支持吗 #51