Closed LinguList closed 2 years ago
It looks quite challenging, with more than 700 symbols (in the context version).
What is this symbol: ►
? It occurs in many datasets, but it is not clear what it represents.
That's a placeholder - should be declared as null
in the metadata, I guess. It means transcriptions are missing, but sound files are there.
yikes, not what I would've expected. I'm surprised there are sound files without transcriptions. We should flag those and Lana and I can transcribe them.
That's a bit of a problem for the lexibank policy, which doesn't allow empty forms.
It seems to be only 125:
$ csvgrep -c Form -m"►" cldf/forms.csv | wc -l
125
Here's the affected languages:
$ csvgrep -c Form -m"►" cldf/forms.csv | csvcut -c Language_ID | sort | uniq
Ctlsungwadolagwaranuta
Nleovandue
Sabunlap
Safponof
Safrangusuksu
Safyangtus
Sageneral
Sahaa
Sapaangi
Saranwas
Wwalaha
I think I'd leave orthography profile and added transcriptions for after the initial release. The first release should represent what was taken over from sndcmp.
I was literally just about to write the same thing.
So with the PR #24, we can now make the orthography profile. I suggest we follow the same practice as we used for ABVD: I will make an initial profile and make sure it hits 100%, then I create individual-language profiles, and hand over to @maryewal and her team.
@maryewal and @tihomirrangelov, what sounds are m̼
and the m̼b̼
, I don't find them in standard IPA, and these are the only cases left where I am without a clue (the other cases are still EXTREME with respect to IPA inconsistency, but that is another story and can be checked with language-specific profiles later).
Otherwise, I have almost finished the profile now and will submit these with placeholders.
@LinguList these are apicolabials (otherwise known as linguolabials) but the transcriber confused the notation I believe. Typically the "seagull" diacritic should be used with n and nd here to represent the nasal and prenasalized linguolabial, respectively.
Yes, that confused me now also. So how should we need voiced linguolabial nasal consonant and the stop consonant.
@Bibiko, maybe if you anyway help me finalizing my #26, you could also search for the cases marked with ! and replace with the correct symbols here?
@LinguList As Mary said, the usual notations are n̼ and d̼ Also, for the others: t̼ θ̼ ð̼
Yes, that confused me now also. So how should we need voiced linguolabial nasal consonant and the stop consonant.
And the [m̼b̼], which should be [n̼d̼], = prenasalised voiced linguolabial stop consonant.
ⁿd̼
Prenasalization is written as a superscript n in our system.
My last PR has addressed all of this. From there, we can then go for individual languages.
Yes, I understand it should be superscript in our/your system - I was meaning what a "standard" transcription should have been, as in not the seagull with mb.
Ah, okay, this was confusing me now quite a bit.
But it is dealt with anyway now. And later, on a language-individual basis, additional corrections can be made in case there are further problems. We justt need to wait for @Bibiko to merge my PR.
I propose I make a draft profile later, and then ask @maryewal to check. It means, we can also add sound inventories to the CLLD dataset, which would of course be very nice!