lexibank / abvdoceanic

Creative Commons Attribution 4.0 International
5 stars 2 forks source link

[Orthography] Marquesan has too many vowels and consonants #19

Closed antipodite closed 2 years ago

antipodite commented 3 years ago

The problem here seems to be that the forms in the ABVD come from different sources which used different orthographies -- Dordillon's dictionary indicates long vowels with an acute accent but later Marquesan orthography doubles the vowel. We also have some extremely garbled forms that seem to be the result of poor transcription

maryewal commented 3 years ago

Which Marquesan are we talking about here? ID 38 or 64?

antipodite commented 3 years ago

I'm not actually sure, there seems to be a single orthography profile for Marquesan in etc/orthography/. cldf/languages.csv says

"POLLEX 2000 (Bruce Biggs & Ross Clark), David Addison",https://abvd.shh.mpg.de/austronesian/language.php?id=38,Simon Greenhill and Russell Gray,Simon Greenhill,"This is an amalgamation of North (MRQ) and South Marquesan (QMS)
Pollex supplemented from Dordillon 1931-1932",

maryewal commented 3 years ago

That's Marquesan_38 in ABVD. The many vowels here are due to old transcription conventions in Dordillons time and then it looks like a couple of corrected entries with later transcription conventions. The consonant issue is because it is a mix of forms from two varieties with different consonant inventories, mostly from N. Marquesan, but also some S. Marq. This didn't really pose a problem for the cognate coding, but we may want to separate these out...

antipodite commented 3 years ago

How should I proceed? Leave for today?

maryewal commented 3 years ago

can we generate something similar to what you did for ulithian? I can probably work out the separation tonight.

antipodite commented 3 years ago

like this? change the extension to .tsv, github won't let me upload a .tsv for some stupid reason marquesan-alignments.csv

maryewal commented 3 years ago

So, I have separated these out, but not sure what we should actually do here - unfortunately, the set isn't as straaight forward as each variety having the full list of concepts. By removing the South Mqsic (or North) we get significantly reduced coverage. If we keep both varieties together for a "Marquesan" language, we will get a full set of consonants. There are some additional consonants/vowels in there from Dordillon's old orthography (d = r, accented vowels, etc) that when corrected will reduce the overall number of consonants and will definitely correct the number of vowels. @SimonGreenhill @LinguList curious about your thoughts. My opinion is it is better to leave the varieties together (correcting the Dordillon orth, obviously) for max coverage.

LinguList commented 3 years ago

Yes, I think this is reasonable for now. As long as no hard-core historical comparison is involved, one can proceed by not splitting. If it was for the sound correspondences ultimately, one might be forced to manually reconstruct one variety from the information of the other. But for the current purposes, we won't need that level of rigor.