Invalid sounds - Githubissues

LinguList commented 2 years ago

@IndianaTones, we introduced cases like p ⁵/ ∼/ɛː. This is not permissive, it should be p ⁵/ɛ̃ː now. There are about 70 cases of this type in edictor, I do not know why they show up now, but could you have a look (use the tokens=/ ∼ filter to identify them, and correct them? It would be better to have this correct in EDICTOR rather than coding over it in CLDF.

LinguList commented 2 years ago

I have fixed this automatically now, but the data on EDICTOR should ideally be updated.

IndianaTones commented 2 years ago

tokens=/ ∼

There are quite a few of these in Edictor in fact - is there a way to change them globally? i.e. not each one by hand?

LinguList commented 2 years ago

There are 70 cases. I added a lot of code yesterday to account for correcting everything when converting data back to CLDF, but in EDICTOR I would not wish to do any of that now: I count some 70 cases, so it is faster to do those in 30 minutes manually than having a program that does this, checking the program, and then changing everything back and forth again, uploading data, etc.

IndianaTones commented 2 years ago

The trouble is that there are many more like ∼/j - should these also be rendered as j̃ ?

LinguList commented 2 years ago

Another question: have you started to correct the entries?

LinguList commented 2 years ago

Because cases like p ⁵/ ɛ̃ː are still wrong, since you have to remove -- obviously -- the space between / and following vowel. The tone is thus indicated as part of the vowel, but not processed by the computer.

LinguList commented 2 years ago

Regarding ∼/j: we can convert this automatically later. Same for cases of ∼/r or other cases we can spot.

LinguList commented 2 years ago

In total, there are 900 entries with ∼. I'd modify them with the code. But the cases with spaces in between should still be handled, as this is disturbing alignments in the alignment view in edictor.

IndianaTones commented 2 years ago

Yes and another thing I ran across is that prenasalized stops like ɔ̀ngɔ́ are sometimes incorrectly rendered as nasalization on the vowel ∼¹/ɔ ɡ ⁵/ɔ whereas these should be ¹/ɔ ⁿɡ ⁵/ɔ. I'm fixing these in Edictor as I find them.

LinguList commented 2 years ago

This was then a problem with the orthography profile, very good catch! This means, I'd really encourage you to have a very close look at all phonetic renderings, if they look okay, etc. Leaving only systematic cases like j-nasalized and w-nasalized for automated conversion later.

IndianaTones commented 2 years ago

Done.

LinguList commented 2 years ago

Grapheme	Diacritics	Unicode	Segments	Graphemes	Count
⁵/	◌⁵/		d w ⁵/ ɛ̃	dwɛ́ⁿ	1
lɔ̀	◌lɔ̀	U+006c U+0254 U+0300	lɔ̀ + t ¹/u ŋ ⁵/o	tùŋó	1
ŋ̄	◌ŋ̄	U+014b U+0304	n ¹³/iː + ŋ̄ + k ¹/u ⁿb ¹/ɛ	kùmbɛ̀	1

LinguList commented 2 years ago

These are three final errors that should be addressed. lɔ̀ shoudl of course be l ¹/ɔ, ŋ̄ should be ŋ̄/ŋ, and ⁵/ ɛ̃ shoudl be ⁵/ɛ̃.

LinguList commented 2 years ago

Once you have dealt with this, @IndianaTones, let me know, and I run to check again.

LinguList commented 2 years ago

Updated list now here:

Grapheme	Diacritics	Unicode	Segments	Graphemes	Count
/⁵i	◌/⁵i	U+2075 U+0069	k /⁵i r̃ ⁵/i	kírⁿí	1
⁵j	◌⁵j	U+2075 U+006a	d u ⁵j + j ¹/ɛ	dú-yyɛ̀	1
ŋ̀	◌ŋ̀	U+014b U+0300	b ⁵/a n ⁵/a ŋ̀ + k ¹/uː	bànà-ŋ̀-kùù	1
lɔ̀	◌lɔ̀	U+006c U+0254 U+0300	lɔ̀ + t ¹/u ŋ ⁵/o	tùŋó	1
ŋ̄	◌ŋ̄	U+014b U+0304	n ¹³/iː + ŋ̄ + k ¹/u ⁿb ¹/ɛ	kùmbɛ̀	1
vj	◌vj	U+0076 U+006a	n ¹/u m ¹/ɔ + s ¹/a vj	nùmɔ̀-sàý	1
sɔː	◌sɔː	U+0073 U+0254 U+02d0	sɔː	sɔ́ɔ́	1
∼/⁵ɔ	◌∼/⁵ɔ	U+2075 U+0254	b ∼/⁵ɔ dʒ ¹/ɛ	bɔ́njɛ̀	1
+l	◌+l	U+002b U+006c	p ⁵/a ɡ ⁵/u +l ¹/e	págúlè	1
dw	◌dw	U+0064 U+0077	dw ã n a	dwaⁿna	3

IndianaTones commented 2 years ago

Fixed, plus ʤ in Bangime was rendering as j rather than dʒ so went through and corrected those.

LinguList commented 2 years ago

Nice, indeed, fixed now.

lexibank / baf2

Invalid sounds #4