Closed LinguList closed 2 years ago
I have fixed this automatically now, but the data on EDICTOR should ideally be updated.
tokens=/ ∼
There are quite a few of these in Edictor in fact - is there a way to change them globally? i.e. not each one by hand?
There are 70 cases. I added a lot of code yesterday to account for correcting everything when converting data back to CLDF, but in EDICTOR I would not wish to do any of that now: I count some 70 cases, so it is faster to do those in 30 minutes manually than having a program that does this, checking the program, and then changing everything back and forth again, uploading data, etc.
The trouble is that there are many more like ∼/j - should these also be rendered as j̃ ?
Another question: have you started to correct the entries?
Because cases like p ⁵/ ɛ̃ː
are still wrong, since you have to remove -- obviously -- the space between /
and following vowel. The tone is thus indicated as part of the vowel, but not processed by the computer.
Regarding ∼/j
: we can convert this automatically later. Same for cases of ∼/r
or other cases we can spot.
In total, there are 900 entries with ∼
. I'd modify them with the code. But the cases with spaces in between should still be handled, as this is disturbing alignments in the alignment view in edictor.
Yes and another thing I ran across is that prenasalized stops like ɔ̀ngɔ́
are sometimes incorrectly rendered as nasalization on the vowel ∼¹/ɔ ɡ ⁵/ɔ
whereas these should be ¹/ɔ ⁿɡ ⁵/ɔ
. I'm fixing these in Edictor as I find them.
This was then a problem with the orthography profile, very good catch! This means, I'd really encourage you to have a very close look at all phonetic renderings, if they look okay, etc. Leaving only systematic cases like j-nasalized and w-nasalized for automated conversion later.
Done.
Grapheme | Diacritics | Unicode | Segments | Graphemes | Count |
---|---|---|---|---|---|
⁵/ | ◌⁵/ | d w ⁵/ ɛ̃ | dwɛ́ⁿ | 1 | |
lɔ̀ | ◌lɔ̀ | U+006c U+0254 U+0300 | lɔ̀ + t ¹/u ŋ ⁵/o | tùŋó | 1 |
ŋ̄ | ◌ŋ̄ | U+014b U+0304 | n ¹³/iː + ŋ̄ + k ¹/u ⁿb ¹/ɛ | kùmbɛ̀ | 1 |
These are three final errors that should be addressed. lɔ̀
shoudl of course be l ¹/ɔ
, ŋ̄ should be ŋ̄/ŋ, and ⁵/ ɛ̃
shoudl be ⁵/ɛ̃
.
Once you have dealt with this, @IndianaTones, let me know, and I run to check again.
Updated list now here:
Grapheme | Diacritics | Unicode | Segments | Graphemes | Count |
---|---|---|---|---|---|
/⁵i | ◌/⁵i | U+2075 U+0069 | k /⁵i r̃ ⁵/i | kírⁿí | 1 |
⁵j | ◌⁵j | U+2075 U+006a | d u ⁵j + j ¹/ɛ | dú-yyɛ̀ | 1 |
ŋ̀ | ◌ŋ̀ | U+014b U+0300 | b ⁵/a n ⁵/a ŋ̀ + k ¹/uː | bànà-ŋ̀-kùù | 1 |
lɔ̀ | ◌lɔ̀ | U+006c U+0254 U+0300 | lɔ̀ + t ¹/u ŋ ⁵/o | tùŋó | 1 |
ŋ̄ | ◌ŋ̄ | U+014b U+0304 | n ¹³/iː + ŋ̄ + k ¹/u ⁿb ¹/ɛ | kùmbɛ̀ | 1 |
vj | ◌vj | U+0076 U+006a | n ¹/u m ¹/ɔ + s ¹/a vj | nùmɔ̀-sàý | 1 |
sɔː | ◌sɔː | U+0073 U+0254 U+02d0 | sɔː | sɔ́ɔ́ | 1 |
∼/⁵ɔ | ◌∼/⁵ɔ | U+2075 U+0254 | b ∼/⁵ɔ dʒ ¹/ɛ | bɔ́njɛ̀ | 1 |
+l | ◌+l | U+002b U+006c | p ⁵/a ɡ ⁵/u +l ¹/e | págúlè | 1 |
dw | ◌dw | U+0064 U+0077 | dw ã n a | dwaⁿna | 3 |
Fixed, plus ʤ in Bangime was rendering as j rather than dʒ so went through and corrected those.
Nice, indeed, fixed now.
@IndianaTones, we introduced cases like
p ⁵/ ∼/ɛː
. This is not permissive, it should bep ⁵/ɛ̃ː
now. There are about 70 cases of this type in edictor, I do not know why they show up now, but could you have a look (use thetokens=/ ∼
filter to identify them, and correct them? It would be better to have this correct in EDICTOR rather than coding over it in CLDF.