sillsdev / cog

Cog is a tool for comparing languages using lexicostatistics and comparative linguistics techniques.
http://sillsdev.github.io/cog/
MIT License
22 stars 10 forks source link

Tone marks behave differently than diacritics #58

Open Steve-Miller opened 8 years ago

Steve-Miller commented 8 years ago

I'm sorry I don't have the specifics on this one. I meant to write it up months ago, but things happened to me. I thought I had it in my email drafts or something, but I don't. I thought I would mention it anyway, in case someone else bumps into it. It could be the same issue.

The problem is that Cog deals with tone marks (˩ ˥) differently than if I used diacritics for tone (è é). It could be the same problem I was talking about in issue #49, but I don't think so. I had a specific example at one time. I do know for sure that I ended up removing all the tone marks and putting in diacritics. That was not fun.

ddaspit commented 8 years ago

Could it have something to do with syllabification? Tone letters do affect syllabification. Cog uses them as syllable breaks. The tone diacritics do not have this effect.

Steve-Miller commented 8 years ago

It might have had something to do with syllabification, but it might not have been. Seems to me the tone marks affected the likely cognate/non-cognate analysis in the Compare / Variety Pairs tab. If I'm mistaken, that in turn affects the similarity matrices.

While tone letters are often (usually?) written at the end of syllables, I'm not sure if it's a good idea for Cog to use them as syllable breaks. I'm still thinking through that. But given that both diacritics and tone marks are used to mark tone, even phonetically, I think they should have the same outcome regardless which is used.

I did have a specific example at one time. I don't know what happened to it. I have since stepped away from the position and the two languages I was working on. Again, apologies.