Open complinger opened 4 years ago
Hi
Thanks for sending an issue with very clear descriptions! I will take a look at this very soon.
Thanks! Xinjian
Hi, sorry for the late reply.
The main cause of the issue here is because we built the PHOIBLE inventory by using the Segment columns rather than the allophone column as the allophone column is empty for lots of languages.
I think using the allophone column (when nonempty) should be the expected behavior as you suggested. In the next pretrained model update, I will fix the inventory to solve this issue.
Thanks!
I noticed some discrepancy between the phone inventory and the allophones listed in phoible, for example for Cantonese (yue):
Output of allosaurus.list_phone:
a a̞ e f h i j k kʰ kʷ kʷʰ l l̥ l̪ l̪̥ m m̩ n n̪ o p pʰ r s sʰ t tʰ t̠ t̪ t̪ʰ u w y æ ŋ ŋ̩ œ œ̞ ɐ ɔ ɛ ɪ ɪ̞ ɵ ʃ ʃʰ ʊ ʊ̟ β
Output of Phoible:
m i k j u a p w n t l s ŋ h f ɛ ɔ ts kʰ pʰ ɪ ʊ tʰ kʷ tsʰ y ai œ au ɐ kʷʰ ui ei ɵ iu ou ɔi ɐi ɐu ɛu ɵy
It seems like the two character phones (i.e: "ts","ui", "ei", "iu") are missing from Allosaurus. Is this an intentional design decision, or a problem with the way the inventory lists were built? (the Allosaurus phone inventory for Mandarin cmn also lacks 2-character phones)
Description:
The phone inventory for Kunwinjku (iso gup) is incomplete. The output of
python -m allosaurus.list_phone --lang gup
is:However, Phoible lists the complete inventory as:
https://phoible.org/inventories/view/883
Expected behavior
I would expect the allosaurus model inventory for iso gup to be: