lexibank / zhangrgyalrong

Old Chinese Gyalrong cognates
Other
0 stars 0 forks source link

Concepticon mapping #2

Closed laiyunfan closed 1 year ago

laiyunfan commented 4 years ago

So at first sight the automatic mapping is not very good, the coverage is low: 17/122 14% . Do we need manual mapping? @LinguList

LinguList commented 4 years ago

I propose a workflow, @laiyunfan.

  1. we have two glosses now, what we call ENGLISH, and what we call Gloss_in_Source.
  2. the ENGLISH gloss should be manually adjusted, so I recommend to go for the Chinese concept, maybe, writing something like "the breast" instead of "nipple, milk", as, well, this is a bit too broad I think, also "leader, queen", write "queen".
  3. then repeat the mapping, and see if it increases.
  4. then refine manually
Wu-Urbanek commented 4 years ago

The source_concepts in zhang2019-oc-rgyal.tsv are the glosses for Middle and Old Chinese. In the same file, an annotation column (Cogtse_gloss) indicates the reall meaning of words in gyalrong languages/dialects. Should we separate concepts into two files? Or just add another column?

laiyunfan commented 4 years ago

There are more than one column actually, Cogtse_gloss, Zbu_gloss, Japhug_gloss, etc. So more concepts must be mapped.

Wu-Urbanek commented 4 years ago

Please check "raw/Zhang2019_Concepts_updates_mapping.tsv" I added new concepts from the annotations. @laiyunfan please map the newly added concepts to concepticon. I will modify the script after the new concepts are linked.

Wu-Urbanek commented 4 years ago

Concepticon mapping need to be postpone till new concepticon conceptsets are merged into Concepticon_data master branch

laiyunfan commented 4 years ago

So I stop reviewing the mapping for a while.

SimonGreenhill commented 4 years ago

Where are we on this?

Wu-Urbanek commented 4 years ago

Where are we on this?

The progress now is waiting for Concepticon minor release and then refine the mapping.

SimonGreenhill commented 4 years ago

Ok, concepticon's been released now, right?

LinguList commented 4 years ago

Nope, this dataset is too specific in my opinion, to link it to concepticon. Or do we link in general? If so, we'd need somebody to add this to concepticon.

LinguList commented 4 years ago

Okay, @MacyL, @laiyunfan, I just checked the dataset. It looks like you can just add the concept list to concepticon, as you know it. Can I ask you to make a mapping proposal soon? Maybe @laiyunfan you could do the mapping and submit to Concepticon, and @MacyL you could manage the PR and the reviewers? Leave concepts in doubt just unmapped, this will be fine.

Wu-Urbanek commented 4 years ago

I can help with that.

laiyunfan commented 4 years ago

Ok

FredericBlum commented 1 year ago

@LinguList This is one of the last lists that is missing in concepticon. However, there are some issues on which you might be able to help:

There is also a lot of confusion in the raw/ folder, including four different concept-files. How do we best proceed with this?

LinguList commented 1 year ago

In fact, the data is fine now, and we can close this issue. The confusion was a bit that I had forgotten that the problems had already been fixed, or that we did not close the issue. I recently check the data, and my update from last week should make everything clear. The raw folder is a mess, but the CLDF data is not.

LinguList commented 1 year ago

As far as I am aware, no Zhang-2019-122 is in the concepticon, the list is only in etc in this repo, and should be removed from there.

LinguList commented 1 year ago

I just did that.

LinguList commented 1 year ago

So this datasets can be considered to have been fixed and brought up to Concepticon 3.0.