lexibank / lsi

CLDF dataset derived from Grierson's "Linguistic Survey of India" from 1928
https://lsi.clld.org
Creative Commons Attribution 4.0 International
1 stars 0 forks source link

tones are displayed wrongly in original transcriptions #6

Open LinguList opened 4 years ago

LinguList commented 4 years ago

It is of course nice to have the tones represented in this form, but they should be added AFTER the syllable, not before. I wonder how this could be done: can we do it automatically, or could one just go through the files and modify this directly? Note that the tones should not be separated by a space, they should just be added directly after a syllable.

PhyloStar commented 4 years ago

A question about the way tone is being rendered. The original transcription had a falling tone for Sga literary language.

This is what is being shown in the original transcription if the line is copy pasted into a docx file or in this dialogue box.

34a. Sgå, literary = tä ˦˨ yü ˧ ta ˥˦ p’ā

In contrast, opening with gedit shows as:

34a. Sgå, literary = tä ˦˨ yü ˧ ta ˥˦ p’ā

The falling tone gets written as ⁴² in orthography profile. Is this normal behaviour?

PhyloStar commented 4 years ago

Grierson's volume places tones before the syllable. One way to deal with this is to keep the tones in their original places but ignore them for cognate detection and phylogenetic purpose.

LinguList commented 4 years ago

Okay. What you need to know about the tone marks: some softwrae combines them, as you have shown from docx, etc., but text editors usually lack the fonts to do that. Similar to some ligatures in devanagari writing that would not be rendered by a text editor, but shown in docx or similar.

We use the Chao numbers instead of the tone symbols in CLTS consequently, as we find them more commonly used in SEA areas, where they are quasi-standard, even if IPA does not support them. That's why I converted. ˦˨ is essentially ˦ = 4 and ˨ = 2, so it is the same idea.

If Griersen places tone before, the text is rendering the original, this is important.

We then have to keep this, but we have to think of a way to either ignore tones in our analysis (maybe most useful thing) or to place them after the syllable.

We leave this open until solved.

LinguList commented 4 years ago

Okay, what we need to do, as tones are a bit problematic, as far as I can see that, is to make a list of all the syllables with tones and turn them around. This can be done automatically, I think.

PhyloStar commented 4 years ago

Okay. This seems to be doable. We need to switch the tone position (hopefully so trivial).

LinguList commented 4 years ago

you can check the new forms now, I managed to do this now. There may be errors, it's a list of 2400 items in etc/lexemes.tsv, but well, most of them are correct, I guess.