lexibank / lsi

CLDF dataset derived from Grierson's "Linguistic Survey of India" from 1928
https://lsi.clld.org
Creative Commons Attribution 4.0 International
1 stars 0 forks source link

Order and number of languages in the source #23

Closed xrotwang closed 3 years ago

xrotwang commented 3 years ago

I think it would be nice if we had the number and order (and classification, e.g. "AGGLUTINATIVE NON-INDIAN LANGUAGES") of languages as they appear in the book in https://github.com/lexibank/lsi/blob/master/etc/languages.tsv This might require making up numbers for some languages, but would be useful to provide a good comparison of our data and the scans (like for TPPSR)

LinguList commented 3 years ago

Yes, @PhyloStar, if you can add a column "Order" to the languages.csv, this would be very cool.

xrotwang commented 3 years ago

Three columns, actually:

LinguList commented 3 years ago

Sorry, so we would add (correct me, @xrotwang, if you disagree):

LinguList commented 3 years ago

sorry, was pasting at teh same time. We need Family/classification and subgroup, as they have two levels, right? E.g., Tibeto-Burman -> Tibetan, and Tibeto-Burman -> Himalayan

PhyloStar commented 3 years ago

Okay. If we have classification string then we don't need "agglutinative non-indian languages" information, right...

The raw files do have the original number in the first column. https://github.com/lexibank/lsi/blob/master/raw/LSI_txt/1-21/10-11%20Five.txt

Order would be based on sequence of appearance in the pdf. Okay. I think I will add this.

LinguList commented 3 years ago

@PhyloStar, expand this line here:

https://github.com/lexibank/lsi/blob/534b6ebcd46c673349ce9f85a5e4d62d7c499d3e/lexibank_lsi.py#L18-L21

In this way:

@attr.s
class CustomLanguage(Language):
    NameInSource = attr.ib(default=None)
    Order = attr.ib(default=None)
    Classification = attr.ib(default=None)
    SubGroup = attr.ib(default=None)

This will then be available, and you can add the information, if you add these fields (Order, etc.) also to the languages.tsv

PhyloStar commented 3 years ago

I am uploading a version of the languages.txt file (a tab separated file). Does it look like what was discussed in the thread? languages.txt

xrotwang commented 3 years ago

yes, perfect! I'll integrate it in the repos.