Add orthography profile

LinguList commented 3 years ago

I propose I make a draft profile later, and then ask @maryewal to check. It means, we can also add sound inventories to the CLLD dataset, which would of course be very nice!

LinguList commented 3 years ago

It looks quite challenging, with more than 700 symbols (in the context version).

What is this symbol: ► ? It occurs in many datasets, but it is not clear what it represents.

xrotwang commented 3 years ago

That's a placeholder - should be declared as null in the metadata, I guess. It means transcriptions are missing, but sound files are there.

maryewal commented 3 years ago

yikes, not what I would've expected. I'm surprised there are sound files without transcriptions. We should flag those and Lana and I can transcribe them.

xrotwang commented 3 years ago

That's a bit of a problem for the lexibank policy, which doesn't allow empty forms.

xrotwang commented 3 years ago

It seems to be only 125:

$ csvgrep -c Form -m"►" cldf/forms.csv | wc -l
125

xrotwang commented 3 years ago

Here's the affected languages:

$ csvgrep -c Form -m"►" cldf/forms.csv | csvcut -c Language_ID | sort | uniq
Ctlsungwadolagwaranuta
Nleovandue
Sabunlap
Safponof
Safrangusuksu
Safyangtus
Sageneral
Sahaa
Sapaangi
Saranwas
Wwalaha

xrotwang commented 3 years ago

I think I'd leave orthography profile and added transcriptions for after the initial release. The first release should represent what was taken over from sndcmp.

maryewal commented 3 years ago

I was literally just about to write the same thing.

LinguList commented 2 years ago

So with the PR #24, we can now make the orthography profile. I suggest we follow the same practice as we used for ABVD: I will make an initial profile and make sure it hits 100%, then I create individual-language profiles, and hand over to @maryewal and her team.

LinguList commented 2 years ago

@maryewal and @tihomirrangelov, what sounds are m̼ and the m̼b̼, I don't find them in standard IPA, and these are the only cases left where I am without a clue (the other cases are still EXTREME with respect to IPA inconsistency, but that is another story and can be checked with language-specific profiles later).

LinguList commented 2 years ago

Otherwise, I have almost finished the profile now and will submit these with placeholders.

maryewal commented 2 years ago

@LinguList these are apicolabials (otherwise known as linguolabials) but the transcriber confused the notation I believe. Typically the "seagull" diacritic should be used with n and nd here to represent the nasal and prenasalized linguolabial, respectively.

LinguList commented 2 years ago

Yes, that confused me now also. So how should we need voiced linguolabial nasal consonant and the stop consonant.

LinguList commented 2 years ago

@Bibiko, maybe if you anyway help me finalizing my #26, you could also search for the cases marked with ! and replace with the correct symbols here?

tihomirrangelov commented 2 years ago

@LinguList As Mary said, the usual notations are n̼ and d̼ Also, for the others: t̼ θ̼ ð̼

maryewal commented 2 years ago

Yes, that confused me now also. So how should we need voiced linguolabial nasal consonant and the stop consonant.

And the [m̼b̼], which should be [n̼d̼], = prenasalised voiced linguolabial stop consonant.

LinguList commented 2 years ago

ⁿd̼

LinguList commented 2 years ago

Prenasalization is written as a superscript n in our system.

LinguList commented 2 years ago

My last PR has addressed all of this. From there, we can then go for individual languages.

maryewal commented 2 years ago

Yes, I understand it should be superscript in our/your system - I was meaning what a "standard" transcription should have been, as in not the seagull with mb.

LinguList commented 2 years ago

Ah, okay, this was confusing me now quite a bit.

LinguList commented 2 years ago

But it is dealt with anyway now. And later, on a language-individual basis, additional corrections can be made in case there are further problems. We justt need to wait for @Bibiko to merge my PR.

lexibank / vanuatuvoices

Add orthography profile #16