lexibank / lsi

CLDF dataset derived from Grierson's "Linguistic Survey of India" from 1928
https://lsi.clld.org
Creative Commons Attribution 4.0 International
1 stars 0 forks source link

add concepts.tsv #1

Closed LinguList closed 4 years ago

LinguList commented 4 years ago

Use the concept as it appears in the source, and also provide a number (that reflects the order in the source).

LinguList commented 4 years ago

also provide page numbers, as this is useful.

PhyloStar commented 4 years ago

Updated the file here: https://github.com/lexibank/lsi/blob/master/etc/concepts.tsv

LinguList commented 4 years ago

the current code finds the following cases, which occur in the data, but not in the concept list:

226-227 Why_int
298-299 You beat (Past Tense)
322-323 You go
224-225 What_int
222-223 Who_int
302-303 I shall beat

As the data is supposed to be a digitization of the original manuscript, I suppose this can be best handled by modifying the digitization, so the concepts mentioned here should be replaced by the ones given in etc/concepts.tsv.

PhyloStar commented 4 years ago

in the case of Why_int, What_int, Who_int, there should be a ? instead of int which I replaced.

I updated the concepts.tsv list for beat verbs.

LinguList commented 4 years ago

Okay, we run this again and then submit the concept list to concepticon. We are advancing here.

LinguList commented 4 years ago

concepts-mapped.tsv.txt

LinguList commented 4 years ago

this is the automated concepts, which need to be verified. What needs to be done is:

I suppose, @PhyloStar, you give it a try, please compare also the concepticon mappings at the website at https://concepticon.clld.org, but also at https://digling.org/calc/concepticon/

PhyloStar commented 4 years ago

concepts-mapped-concepticon.tsv.txt Except for 1554 and 2106 corresponding to MAN, which I was not clear, I could verify the concepts.

LinguList commented 4 years ago

so there are two ways to proceed now:

  1. we give this to our students and collaborators of concepticon to add the list there officially

  2. you do the official submission to concepticon via a PR

In case of 2, I'd recommend to read this article by our doctoral studend @AnnikaTjuka https://calc.hypotheses.org/2225

In case of 1, this will be added to our stack of things we have to do. It may be useful for you to do the review (i.e., 2) procedure once, but if you don't have time, we can take over ;)

PhyloStar commented 4 years ago

Thanks. :) I reached until testing. The output of the test command is. What is the reason?

ERROR:conceptlists.tsv:314: link without label: (:bib:Grierson1928) ERROR inconsistent data in repository /home/trk0076/concepticon_for_lsi/concepticon-data

The last line in the conceptslists.tsv is here:

Grierson-1928-168 Grierson, George Abraham 1928 168 basic, questionnaire English South Asian languages https://dsal.uchicago.edu/books/lsi/lsi.php?volume=1-2&pages=381 Grierson1928 This list of 168 items for more than 380 language varieties edited by Grierson [Grierson 1928] (:bib:Grierson1928) from the comparative vocabulary volume. 2-337

PhyloStar commented 4 years ago

There was a space between [Grierson 1928] and (:bib:Grierson1928). I removed it. Everything passed integrity tests.

PhyloStar commented 4 years ago

I am looking here: https://github.com/concepticon/concepticon-data/blob/master/CONTRIBUTING.md How do I get the pull request working?

LinguList commented 4 years ago

Do you have your own fork of concepticon on your github? If so, it should be easy, you push to the new fork the new version, and then make the PR on github. If not, it is possible that you will need write access and create a new branch of concepticon. Let's start with a fork.

PhyloStar commented 4 years ago

I made a pull request. Thank you.