lexibank / abvdoceanic

Creative Commons Attribution 4.0 International
5 stars 2 forks source link

Check Master Profile #7

Closed LinguList closed 3 years ago

LinguList commented 3 years ago

Okay, before we convert to individual profiles, the master profile must be checked for the points I mentioned.

Here, rough errors need to be filtered out. @antipodite, I would like you to have a look here.

LinguList commented 3 years ago

Having installed the package, we check the profile by typing:

cldfbench lexibank.check_profile lexibank_abvdoceanic.py

To specify a langauge, you can type:

cldfbench lexibank.check_profile lexibank_abvdoceanic.py --language=Mangaia
LinguList commented 3 years ago

First, there are teh generated graphemes, then the modified graphemes.

Found 116 generated graphemes

Grapheme BIPA Modified Segments Graphemes Count
cʷ/tsʷ tsʷ * u cʷ/tsʷ a ucʷa 1
r̃w/rʷ * s i r̃w/rʷ á/a sir̃wá 1
ḷw/lʷ * d a w a ḷw/lʷ á/a l̩/l dawaḷwáḷ 1
ñw/nʷ * k e ñw/nʷ e + n keñwe-n 1
chw/tsʷʰ tsʷʰ * w e a chw/tsʷʰ e a ch/tsʰ weachweach 1
ay/ai ai * m ay/ai + a s may as 1
sh/sʰ * v a h sh/sʰ i r i vahshiri 1
pʷʰ pʷʰ pʷʰ ã n e phwãne 1
ḍḍ/ɖː ɖː * ê/e β ê/e + ɪ ḍḍ/ɖː õ êβê_ɪḍḍõ 1
nːʷ nʷː * e w e nːʷ ewenwnw 1
'ṱ/ˀt ˀt * b ʌ 'ṱ/ˀt ɩ/ɪ n ɩ/ɪ bʌ'ṱɩnɩ 1
ˀɸ ˀɸ i ˀɸ ʌ ɩ/ɪ d ɩ/ɪ i'ɸʌɩdɩ 1
ˀpʷ ˀpʷ h ʌ ˀpʷ ʌ y/j o ˀm ʌ n hʌ'pwʌyo'mʌn 1
ŋmʷ ŋmʷ ŋmʷ ɛ l t ø l ŋ͡mʷɛltøl 1
ⁿdʷ ⁿdʷ ⁿdʷ u ⁿdʷu- 1
n̈/nʰ * n e + n̈/nʰ p e n l o ŋ ne-n̈penloŋ 1
ⁿw ⁿw ɣ a + ⁿw o s i ɣa-ᵐwosi 1
yˤ/jˤ * yˤ/jˤ ə n yˁən 1
ˀð ˀð ˀð e ˁðe 1
ɣ a + ʔ iˀ i n ɣa-ˀiˀin 1
ɣ a + mˀ o lˀ o l ɣa-mˀolˀol 1
βˀ βˀ r o βˀ e roβˀe 1
ʔː ʔː m ɔ ʔː e mɔ:ˀe 1
m̈m̈/mːʰ mʰː * o m̈m̈/mːʰ a om̈m̈a 1
βʰ βʰ h u βʰ u huβhu 1
fⁿ fⁿ fⁿ dʳ i ɛ ⁿdʳ fⁿdʳiɛⁿdʳ 1
i kˀ a ikˀa 1
βː βː r a βː e + k raββe-k 1
tʰː tʰː p a tʰː i l paththil 1
w a zʷ a s i -wazwasi 1
ⁿm ⁿm i o ⁿm a r a ioⁿmara 1
ⁿbʳ ⁿbʳ g i t g i t ⁿbʳ i gitgitᵐbʳi 1
xⁿ xⁿ s a xⁿ b u ɣ u t saxᵐbuɣut 1
au au au l a͡ul 1
ⁿs ⁿs ⁿs a l ɣ a ⁿsalɣa- 1
ʰʝ ʰʝ ʰʝ a k ʰʝak 1
ʔʰ ʔʰ p a i ʔʰ paiʔh 2
cw/tsʷ tsʷ * i n + kʷ a cw/tsʷ a c/ts in-kwacwac 2
'ǥ/ˀg ˀg * b a 'ǥ/ˀg o r a ba'ǥora 2
d e dʷ i ts e dedwitse 2
ðʰ ðʰ ɣ œ ðʰ ɛ r i ɣœðhɛri 2
k a mˀ á/a u a kam'áua 2
zʰ e h a zheha- 2
ˀmʷ ˀmʷ d i ˀmʷ ɛ n ɛ di'mwɛnɛ 2
ᵐb ⁿb * o ᵐb o r a oᵐbora 2
n u + w̥ a n nu-w̥an 2
ˀd ˀd g o ˀd o ū go'doū 3
ˀn ˀn ˀn a t a l e 'natale 3
ˀf ˀf a ˀf ạ/a i a'fại 3
k e a r u k ŭ kearukŭ 3
ˀw ˀw h u l u n ɛ + ˀd ʌ m d ʌ m ˀw ʌ n ɛ hulunɛ_'dʌmdʌm'wʌnɛ 3
ⁿβ ⁿβ ⁿβ o ŋ ᵐβoŋ- 3
ʰmʷ ʰmʷ ʰmʷ a ⁿd i + n ʰmʷaⁿdi_n 3
'b̠/ˀb ˀb * t ɔ 'b̠/ˀb a s i tɔ'ḇasi 4
ˀb ˀb ˀb a l i a 'balia 4
ɣʰ ɣʰ e ɣʰ e i eɣhei 4
ʃː ʃː j/dʒ æː + ʃː ɔ w jæ:_ʃ:ɔw 4
u nˀ u n unˀun 4
mb mb s a mb -samᵇ 4
n e nː a hː u nennahhu 5
üü üü k üü küü 5
ʰv ʰv ʰv p i ŋ ʰvpiŋ 5
wʰ a i wʱ a i whaiwhai 6
ˀh ˀh a ˀh ä/æ e a'häe 6
'y/ˀj ˀj * h i 'y/ˀj ʌ d ɛ hi'yʌdɛ 6
ˀŋ ˀŋ l o ˀŋ a n a lo'ŋana 7
mʷʰ mʷʰ a mʷʰ a amwha 7
ˀg ˀg o u ʔ a ˀg a ou'a'ga 8
ˀr ˀr ˀr i p i 'ripi 8
chch/tsːʰ tsʰː * p a chch/tsːʰ pachch 9
ⁿɖ ⁿɖ a ⁿɖ u i aⁿḍui- 9
ⁿḍ/ⁿɖ ⁿɖ * ⁿḍ/ⁿɖ a β s a i a ⁿḍaβsaia 9
ī ʔ a īʔa 10
ˀp ˀp h o ʔ o ˀp a ʔ a ho'o'pa'a 11
d̃ u ɪ + n d̃uɪ-n 11
n ə mʷ a n vʰ aː l nəmwanvhaal 11
cːʰ cʰː * cːʰ a chcha 11
ⁿz ⁿz ⁿz a l ⁿzal 11
ŋʰ ŋʰ d̃ u ŋʰ i d̃uŋhi 12
pʷː pʷː pʷː é/e l pʷpʷél 13
ⁿʒ ⁿʒ ⁿʒ i ⁿʒi- 13
əi əi k a r əi a n -karəjan 14
dʳ a r i + k dʳari-k 15
ⁿts ⁿts n a + ɣ a ⁿts na-ɣaⁿt͡s- 15
ˀl ˀl ˀl a l a ŋ a n a 'lalaŋana 17
ˀs ˀs a ˀs ʔ a's' 18
ˀk ˀk s u w a ˀk a l a ŋ suwa'kalaŋ 19
ⁿgʷ ⁿgʷ ⁿgʷ a t u + ⁿg u ᵑgʷatu-ᵑgu 19
k u lʰ ɪ + n kulhɪ-n 20
ˀm ˀm k a ˀm a n a ka'mana 20
wʰ eː k a u wheekau 22
k u hʷ a h kuhwah 23
ˀt ˀt p a ˀt u n a pa'tuna 23
r o s e ï roseï 23
n + ⁿb tʳ a l o + m n-ᵐbtʳalo-m 24
w e sʷ e s weswes 27
ʰn ʰn ʰn e ʰne 29
w e nʷ e n wenwen 31
fʷ a c/ts fwac 31
k i ì kiì 32
v̈/vʱ * v̈/vʱ a u -v̈au 34
s rʷ a c/ts srwac 37
k o rʰ a n i korhani 44
nʰ a nna 51
ⁿr ⁿr b ⁿr u w i + m bⁿruwi-m 53
h ɲ a i + w a lʷ a l o hñai_walwalo 56
kpʷ kpʷ t o kpʷ a + k toᵏpʷa-k 57
ʰm ʰm n i + a ⁿp ʰm ni-aᵐpʰ 59
xʰ e n u + r e xʰenu_re 73
vʷ i r i h i vwirihi 78
sʰ e she 79
ⁿʙ ⁿʙ n u + ⁿʙ u ɸ o n e nu-ᵐʙuɸone 90
ⁿbʷ ⁿbʷ ⁿbʷ a r o ‐ᵐbʷaro 92
v ū + n i + t a b a + n a vū_ni_taba-na 115
ey/ei ei * m ey/ei a n í/i m + m u m ú/u n meyaním_mumún 201
ⁿdʳ ⁿdʳ e ⁿdʳ a eⁿdʳa 230

Found 157 modified graphemes

Grapheme BIPA Segments Graphemes Count
ń/n n s i l i ŋ o ń/n siliŋoń 1
áː/a a z áː/a e n zá:en 1
ṫ/t t x a d ṫ/t xadṫ 1
ʷ/w w p i a + k + w a r u ʷ/w ú/u ŋ pia-k_waruʷúŋ 1
ǵ/g g m u l ǵ/g u mulǵu 1
ó̝/o o o m ó̝/o ŋ omó̝ŋ 1
ḳ/k k a ḳ/k á/a p aḳáp 1
ȡ/d d m e ȡ/d a ʔ a n med̂aʔan 1
m̄/m m n aː t a m̄/m o l i naatam̄oli 1
ẹ/e e s ə m p s ẹ/e səmpsẹ 1
p̃p̃/pː n o p̃p̃/pː a nop̃p̃a 1
ă/a a l ă/a t o u lătou 1
ļļ/lː ļļ/lː a p ļļap 1
ëë/əː əː cʰ ëë/əː n mʷ o n g o chëënmwongo 1
ùù/uː k i ch/tsʰ ùù/uː ch/tsʰ úú/uː kichùùchúú 1
ñ̩/ŋ ŋ a ñ̩/ŋ u aṇ̃u 1
ə́/ə ə s ə́/ə r a sə́ra 1
ⁿj/dʒ ⁿj/dʒ a ⁿj/ⁿdʒ u ⁿjaⁿju 1
ⁿj/ⁿdʒ ⁿdʒ ⁿj/dʒ a ⁿj/ⁿdʒ u ⁿjaⁿju 1
??/ə ə p ??/ə l e u p??leu 1
ǣ/æ æ l ʌ ə tʰ e m ǣ/æ lʌətʰemǣ 1
T/t t T/t a k o Tako 1
čh/tʃʰ tʃʰ čh/tʃʰ e ŋ e + ɲ a čheŋe-ña 1
ʰ/h h ʰ/h a g u a i n g a u ʰaguaingau 1
ṁ/m m pʷ e ṁ/m o pʷeṁo 1
ˤ/ʕ ʕ p l s ɨ n + ˤ/ʕ e β plsɨn_ˁeβ 1
ēē/eː c ēē/eː cēē 1
ƥ/p p b u g a n t a ƥ/p i buɡantaƥi 1
b̥/b b g a b̥/b u r ɡab̊ur 1
i̥/i i r̥ o i̥/i r̊oi̊ 1
t̥/t t b i t̥/t i bit̊i 1
s̥/s s i + s̥/s a r i-s̊ar 1
o̥/o o r o̥/o + n a ro̊-na 1
ŕ/r r w oː ŕ/r u woːŕu 1
ɔʰ/ɔ ɔ n i + v u ɣ ɔʰ/ɔ ni-vuɣɔʰ 1
Tð/tθ Tð/tθ ɪ ð a i Tðɪðai 1
ô/oː ʔ ô/oː m âː/aː ô:˧mâ:˩ 1
èː/eː è/e + g èː/eː è˧_gè:˧ 1
äː/æː æː i pʷ äː/æː i˧pwä:˧ 1
ù:/uː t ù:/uː tù:˧ 1
ɨɨ/ɨː ɨː ç ɨɨ/ɨː çɨɨ 2
ìì/iː m a k a ìì/iː makaìì 2
å/a a å/a w åw 2
ḁ/a a ḁ/a n ån 2
ụ/u u n d ɔ p ụ/u ndɔpụ 2
êê/eː tʰ êê/eː n thêên 2
yy/jː yy/jː a yya 2
ᵗs/ts ts r a m ᵗs/ts a f ramᵗsaf 2
Y/j j f ɪ n a Y/j e -fɪnaYe 2
ïï/ɪː ɪː y/j e w e r ïï/ɪː + r e yewerïï-re 2
üü/yː üü/yː üü 2
û:/uː t û:/uː tû:˧ 2
éː/eː a c/ts éː/eː a˧cé:˧ 2
ĩĩ/ĩː ĩː k ĩĩ/ĩː kĩĩ 3
ė/e e e g ė/e g ėgėg 3
jh/dʒʱ dʒʱ a θ e jh/dʒʱ a ɲ aθejhañ 3
n̩/n n n̩/n a m i a ṇamia 3
ŏ/o o e + l ŏ/o a n a e-_lŏana 3
jw/dʒʷ dʒʷ jw/dʒʷ o jwo- 3
iʰ/i i m o βʷ iʰ/i moβʷiʰ 3
î:/iː n î:/iː m î/i + r î/i nî:˩mî˩-rî˩ 3
ô:/oː c ô:/oː cô:˥ 3
ḍ/ɖ ɖ t i t l á/a k a n ḍ/ɖ í/i l̩/l titlákanḍíḷ 4
ŋ̊/ŋ ŋ i j/dʒ ȯ/o ŋ̊/ŋ ijȯŋ̊ 4
jj/dʒː dʒː p a jj/dʒː i r i pajjiri 4
ņ/n n tː e ņ/n a + k tteņa-k 4
ņ/ŋ ŋ ņ/ŋ a mʷ ņamʷ 4
n̩/ŋ ŋ n a n̩/ŋ i ɔ naṇiɔ 4
ɔ̅/ɔ ɔ ɔ̅/ɔ p l ə ɔ̅plə 4
nǰ/ⁿdʒ ⁿdʒ n ǰ/dʒ a nǰ/ⁿdʒ nǰanǰ 5
ồ/o o l ồ/o lồ 5
ɛʰ/ɛ ɛ a + i a l ɛʰ/ɛ a-ialɛʰ 5
uʰ/u u v u v uʰ/u -vuvuʰ 5
ṙ/r r ṙ/r t j/dʒ i ŋ i n ṙtjiŋin 5
êː/eː m êː/eː mê:˧ 5
ề/e e ề/e ŋ -ềŋ 7
ôô/oː m z ôô/oː -mzôô 7
ṱ/t t ṱ/t o l o m o ṱolomo 7
ö:/œː œː t ä/æ m ä/æ g ö:/œː r i tä˧mä˧gö:˩ri˩ 7
b̃/b b n a b̃/b u l u nab̃ulu 8
îî/iː c îî/iː cîî- 8
ə̥/ə ə r ā/a ɣ ə̥/ə rāɣə̥ 8
ǰ/dʒ n ǰ/dʒ o r o p o nǰoropo 8
t˺/t t h a + β a t˺/t ha-βat˺ 8
âː/aː n âː/aː nâ:˧ 8
n̄/n n r o n̄/n + b o n i ron̄_boni- 9
ɔ̄/ɔ ɔ ŋ ɔ̄/ɔ ŋ o r o ŋɔ̄ŋoro 9
ää/æː æː k ää/æː kää 9
ž/ʒ ʒ u ž/ʒ a uža 9
ᴩ/p p l a ᴩ/p a n laᴩan 9
cs/ts ts l a cs/ts a c/ts lacsac 10
ṭ/ʈ ʈ a ṭ/ʈ a l o aṭalo- 10
əə/əː əː j/dʒ əə/əː + n ɛ jəə_nɛ 11
ṛ/r r ṛ/r a ɲ u ṛañu 11
m̩/m m m̩/m a n i ṃani 11
l̩/l l t i t l á/a k a n ḍ/ɖ í/i l̩/l titlákanḍíḷ 12
ļ/l l ļ/l o ļ/l o a ļoļoa 13
ⁿǰ/ⁿdʒ ⁿdʒ ⁿǰ/ⁿdʒ u a l e ⁿǰuale 13
ò/o o s i s ò/o ŋ i t sisòŋit 14
ů/u u d ů/u g i d u g o důgidugo 14
p˺/p p h a + m a p˺/p ha-map˺ 14
d̃/d d ŋ i d̃/d e + n ŋid̃e-n 15
ɩ/ɪ ɪ t ɩ/ɪ n ɩ/ɪ ʔ ɛ n ɛ tɩnɩ'ɛnɛ 15
v̈/v v v̈/v a t v̈at 19
ij/i i r ij/i rij 21
ᶢ/ʟ ʟ ᶢ/ʟ o ŋ o ᶢʟoŋo- 22
d̠/d d ǥ/g u d̠/d u ǥuḏu 25
ï/ɪ ɪ s n u m b u r a ï/ɪ m snumburaïm 27
ɔɔ/ɔː ɔː k ɔ t ɔɔ/ɔː kɔtɔɔ 29
š/ʃ ʃ š/ʃ ɛ pʷ ĩ r ĩ šɛpʷĩrĩ 32
ū/u u ū/u ū 32
ĕ/e e n ĕ/e y/j a n nĕyan 33
ḡ/g g l a ḡ/g a laḡa 34
p̈/p p s o p̈/p -sop̈ 34
ạ/a a ʔ a t m ạ/a i 'atmại 35
ᶢʟ/ʟ ʟ t ə ᶢʟ/ʟ ɔ ɣ ə təᶢʟɔɣə 35
î/i i m aː l î/i c/ts maalîc 38
ȯ/o o t ů/u k ȯ/o tůkȯ 38
m̈/m m m̈/m i m̈/mʰ i β ə m̈/mʰ i β m̈im̈iβəm̈iβ 39
b̠/b b m ə r á/a b̠/b məráḇ 40
iy/i i e l iy/i eliy 42
óó/oː m é/e m + m é/e w óó/oː mém-méwóó- 44
ù/u u ʔ ù/u i ùi 45
û/u u k û/u x û/u kûxû 47
ì/i i ʔ ì/i m a ìma 50
p̃/p p n a p̃/p e l e nap̃ele 54
ĵ/dʒ k ə n ĵ/dʒ k a m b e kənĵkambe 54
m̈/mʰ r o r o m̈/mʰ -rorom̈ 54
m̃/m m n a m̃/m e l e a r u nam̃elearu 55
éé/eː m éé/eː y/j ú/u ŋ -mééyúŋ 63
ř/r r ɛ ř/r a n ɛřan 64
ī/i i i + m a w ī/i i_mawī 68
ǥ/g g b̠/b a ǥ/g e ḇaǥe 77
ö/œ œ r u g ö/œ y/j rugöy 78
úú/uː c úú/uː cúú 82
è/e e h è/e hèe 83
ē/e e s ē/e + n i + k a u sē_ni_kau 92
ü/y y m e r ü/y k merük 99
ō/o o c ō/o 114
ë/ə ə l ë/ə q lëq 115
r̃/r r y/j i r̃/r yir̃ 117
ä/æ æ m ä/æ rʷ eː l märweel 117
ch/tsʰ tsʰ y/j a l a ch/tsʰ yalach 127
í/i i b o z o z o a n í/i bozozoaní 147
ô/o o p ô/o l e pôle- 154
â/a a w â/a l e m wâlem 155
à/a a y/j á/a ŋ a y/j ʔ à/a ŋ yáŋayàŋ 169
ó/o o k a z ó/o i kazói 221
č/tʃ s ɛ d u ŋ + n e + ɛ č/tʃ sɛduŋ_ne_ɛč 229
ê/e e sʰ ê/e l â/a shêlâ 248
é/e e m e s a p é/e n mesapén 276
ú/u u f ú/u f u fúfu 363
á/a a n i m á/a n nimán 500
c/ts ts y/j o c/ts u yocu 578
ā/a a w ā/a w ā/a wāwā 590
j/dʒ g a j/dʒ i gaji 1084
y/j j y/j a z ó/o n yazón 2646
LinguList commented 3 years ago

What is important to check here is also the consistency. E.g., I converted j to dZ (writing Sampa here), but forgot to do so in the beginning of a string (which we mark by ^ and the end by $).

LinguList commented 3 years ago

Or I have

ⁿj -> ⁿj/dʒ

which is wrong, since it should be:

ⁿj -> ⁿj/ⁿdʒ

So we need to find these cases and adjust them in the profile.

LinguList commented 3 years ago

To check for a sound in pyclts, I gave instructions online, on mattermost, let me know if I should repeat them.

LinguList commented 3 years ago

But I'd for my part now wait for any updates you deem important here.

Then, once I receive a PR, and comment and review it, I run a script to extract langauge-specific profiles, so we can further refine all languages in the future.

antipodite commented 3 years ago

OK. Now when I run cldfbench lexibank.check_profile lexibank_abvdoceanic.py to check the profile I'm getting this error: Config /Users/isaac_stead/Library/Application Support/cldf/catalog.ini has no entry for clts

I don't have a catalog.ini in this folder, which I guess is created when you check out the repo via this method: https://github.com/cldf/cldfcatalog, whereas I just cloned the repo from here and installed it in a pipenv. What do you suggest here

antipodite commented 3 years ago

@maryewal when do you want to look at this together?

LinguList commented 3 years ago

yep, you can create it by:

$ cldfbench catconfig

Then you can also adjust the paths. There is a recent blog post by Annika, which also explains this (for other purposes): https://calc.hypotheses.org

maryewal commented 3 years ago

@antipodite let's go over when we meet Monday morning. If you are eager to get started over the weekend, feel free to go ahead, especially for the type of adjustments @LinguList mentions above - we can go over any questions Monday in that case.

LinguList commented 3 years ago

Note that all questions regarding vowels with accents can be later handled on a language-specific basis, so for now, what is important is to check for overall consistency, and that you acquaintain yourself with the orthography. BTW: these are 400 languages, in fact, not just 170, if I am not mistaken...

LinguList commented 3 years ago

@antipodite, I had to modify a few things in your profile, as they are not accepted by CLTS.

LinguList commented 3 years ago

See my changes here. They are important, since CLTS does not accept aspirated vowels, so I marked the vowels as breathy, which is distinctive enough, I think. The GL thing is also not accepted, as I don't know what sound that should be, so I leave the L which is still distinctive.

LinguList commented 3 years ago

But otherwise all fine.