lexibank / baf2

Bangime and Friends 2
Creative Commons Attribution 4.0 International
0 stars 0 forks source link

Invalid sounds #4

Closed LinguList closed 2 years ago

LinguList commented 2 years ago

@IndianaTones, we introduced cases like p ⁵/ ∼/ɛː. This is not permissive, it should be p ⁵/ɛ̃ː now. There are about 70 cases of this type in edictor, I do not know why they show up now, but could you have a look (use the tokens=/ ∼ filter to identify them, and correct them? It would be better to have this correct in EDICTOR rather than coding over it in CLDF.

LinguList commented 2 years ago

I have fixed this automatically now, but the data on EDICTOR should ideally be updated.

IndianaTones commented 2 years ago

tokens=/ ∼

There are quite a few of these in Edictor in fact - is there a way to change them globally? i.e. not each one by hand?

LinguList commented 2 years ago

There are 70 cases. I added a lot of code yesterday to account for correcting everything when converting data back to CLDF, but in EDICTOR I would not wish to do any of that now: I count some 70 cases, so it is faster to do those in 30 minutes manually than having a program that does this, checking the program, and then changing everything back and forth again, uploading data, etc.

IndianaTones commented 2 years ago

The trouble is that there are many more like ∼/j - should these also be rendered as j̃ ?

LinguList commented 2 years ago

Another question: have you started to correct the entries?

LinguList commented 2 years ago

Because cases like p ⁵/ ɛ̃ː are still wrong, since you have to remove -- obviously -- the space between / and following vowel. The tone is thus indicated as part of the vowel, but not processed by the computer.

LinguList commented 2 years ago

Regarding ∼/j: we can convert this automatically later. Same for cases of ∼/r or other cases we can spot.

LinguList commented 2 years ago

In total, there are 900 entries with . I'd modify them with the code. But the cases with spaces in between should still be handled, as this is disturbing alignments in the alignment view in edictor.

IndianaTones commented 2 years ago

Yes and another thing I ran across is that prenasalized stops like ɔ̀ngɔ́ are sometimes incorrectly rendered as nasalization on the vowel ∼¹/ɔ ɡ ⁵/ɔ whereas these should be ¹/ɔ ⁿɡ ⁵/ɔ. I'm fixing these in Edictor as I find them.

LinguList commented 2 years ago

This was then a problem with the orthography profile, very good catch! This means, I'd really encourage you to have a very close look at all phonetic renderings, if they look okay, etc. Leaving only systematic cases like j-nasalized and w-nasalized for automated conversion later.

IndianaTones commented 2 years ago

Done.

LinguList commented 2 years ago
Grapheme Diacritics Unicode Segments Graphemes Count
⁵/ ◌⁵/ d w ⁵/ ɛ̃ dwɛ́ⁿ 1
lɔ̀ ◌lɔ̀ U+006c U+0254 U+0300 lɔ̀ + t ¹/u ŋ ⁵/o tùŋó 1
ŋ̄ ◌ŋ̄ U+014b U+0304 n ¹³/iː + ŋ̄ + k ¹/u ⁿb ¹/ɛ kùmbɛ̀ 1
LinguList commented 2 years ago

These are three final errors that should be addressed. lɔ̀ shoudl of course be l ¹/ɔ, ŋ̄ should be ŋ̄/ŋ, and ⁵/ ɛ̃ shoudl be ⁵/ɛ̃.

LinguList commented 2 years ago

Once you have dealt with this, @IndianaTones, let me know, and I run to check again.

LinguList commented 2 years ago

Updated list now here:

Grapheme Diacritics Unicode Segments Graphemes Count
/⁵i ◌/⁵i U+2075 U+0069 k /⁵i r̃ ⁵/i kírⁿí 1
⁵j ◌⁵j U+2075 U+006a d u ⁵j + j ¹/ɛ dú-yyɛ̀ 1
ŋ̀ ◌ŋ̀ U+014b U+0300 b ⁵/a n ⁵/a ŋ̀ + k ¹/uː bànà-ŋ̀-kùù 1
lɔ̀ ◌lɔ̀ U+006c U+0254 U+0300 lɔ̀ + t ¹/u ŋ ⁵/o tùŋó 1
ŋ̄ ◌ŋ̄ U+014b U+0304 n ¹³/iː + ŋ̄ + k ¹/u ⁿb ¹/ɛ kùmbɛ̀ 1
vj ◌vj U+0076 U+006a n ¹/u m ¹/ɔ + s ¹/a vj nùmɔ̀-sàý 1
sɔː ◌sɔː U+0073 U+0254 U+02d0 sɔː sɔ́ɔ́ 1
∼/⁵ɔ ◌∼/⁵ɔ U+2075 U+0254 b ∼/⁵ɔ dʒ ¹/ɛ bɔ́njɛ̀ 1
+l ◌+l U+002b U+006c p ⁵/a ɡ ⁵/u +l ¹/e págúlè 1
dw ◌dw U+0064 U+0077 dw ã n a dwaⁿna 3
IndianaTones commented 2 years ago

Fixed, plus ʤ in Bangime was rendering as j rather than so went through and corrected those.

LinguList commented 2 years ago

Nice, indeed, fixed now.