lexibank / lsi

CLDF dataset derived from Grierson's "Linguistic Survey of India" from 1928
https://lsi.clld.org
Creative Commons Attribution 4.0 International
1 stars 0 forks source link

Glottolog mapping for "isolates" #25

Open xrotwang opened 3 years ago

xrotwang commented 3 years ago

There are quite a few languages not linked to a Glottolog languoid although the source gives a classification, see https://lsi.clld.org/languages?sSearch_2=isolate In particular the *_SPOKEN varieties stick out. Can we map these?

PhyloStar commented 3 years ago

OLD_MEITHEI has iso code but no glottocode. https://iso639-3.sil.org/code/omp

PhyloStar commented 3 years ago

GYPSY_EUROPEAN to https://glottolog.org/resource/languoid/id/roma1329

GYPSY_SYRIAN mapped to Domari language https://glottolog.org/resource/languoid/id/doma1258

PhyloStar commented 3 years ago

All the _SPOKEN varieties can be given the same code as the _WRITTEN variety.

PhyloStar commented 3 years ago

MANOE can be mapped to https://glottolog.org/resource/languoid/id/manu1255

PhyloStar commented 3 years ago

CLLD shows 362 as compared to 364 languages. I checked why it is the case.

One of the Bengali dialects has a name called "Eastern" (546 number in the source). A dialect of Balochi (366 number in the source) which has another dialect with the name Eastern are being treated as Eastern Balochi. They should be treated separately.

LAO language is missing in the app. Present here: https://github.com/lexibank/lsi/blob/master/etc/languages.tsv#L48

PhyloStar commented 3 years ago

Aka language has been mapped here: https://github.com/lexibank/lsi/blob/master/etc/languages.tsv#L124

It appears right here too: https://lsi.clld.org/languages/AKA

For some reason it comes up as isolate.

PhyloStar commented 3 years ago

Regarding Eastern Bengali and Eastern Balochi. The orthography for Eastern Balochi is missing. How should be go about adding this? @lingulist.

xrotwang commented 3 years ago

@PhyloStar Aka is mapped to hrus1242, which in fact is classified as isolate in Glottolog. Is this incorrect?

xrotwang commented 3 years ago

@PhyloStar While Lao is present in the mapping file and the wordlist, it seems to not have any non-empty forms in the wordlist.

LinguList commented 3 years ago

@PhyloStar, can you please check with Lao?

PhyloStar commented 3 years ago

@PhyloStar Aka is mapped to hrus1242, which in fact is classified as isolate in Glottolog. Is this incorrect?

LSI calls it a Himalayan language of Tibeto-Burman branch. I think this affected my comment. It is a isolate according to glottolog.

xrotwang commented 3 years ago

@PhyloStar If phylogenetic analysis based on the data would show Aka as isolate, rather than Tibeto-Burman, this would be a nice little finding/corroboration for a paper, right?

PhyloStar commented 3 years ago

Yes. That would be great. I think we can check the posterior branch support and the position in the tree. May be the cognate detection methods would show borrowings as cognate and might support the original hypothesis.