Closed laiyunfan closed 1 year ago
I propose a workflow, @laiyunfan.
The source_concepts in zhang2019-oc-rgyal.tsv are the glosses for Middle and Old Chinese. In the same file, an annotation column (Cogtse_gloss) indicates the reall meaning of words in gyalrong languages/dialects. Should we separate concepts into two files? Or just add another column?
There are more than one column actually, Cogtse_gloss, Zbu_gloss, Japhug_gloss, etc. So more concepts must be mapped.
Please check "raw/Zhang2019_Concepts_updates_mapping.tsv" I added new concepts from the annotations. @laiyunfan please map the newly added concepts to concepticon. I will modify the script after the new concepts are linked.
Concepticon mapping need to be postpone till new concepticon conceptsets are merged into Concepticon_data master branch
So I stop reviewing the mapping for a while.
Where are we on this?
Where are we on this?
The progress now is waiting for Concepticon minor release and then refine the mapping.
Ok, concepticon's been released now, right?
Nope, this dataset is too specific in my opinion, to link it to concepticon. Or do we link in general? If so, we'd need somebody to add this to concepticon.
Okay, @MacyL, @laiyunfan, I just checked the dataset. It looks like you can just add the concept list to concepticon, as you know it. Can I ask you to make a mapping proposal soon? Maybe @laiyunfan you could do the mapping and submit to Concepticon, and @MacyL you could manage the PR and the reviewers? Leave concepts in doubt just unmapped, this will be fine.
I can help with that.
Ok
@LinguList This is one of the last lists that is missing in concepticon. However, there are some issues on which you might be able to help:
There is also a lot of confusion in the raw/ folder, including four different concept-files. How do we best proceed with this?
In fact, the data is fine now, and we can close this issue. The confusion was a bit that I had forgotten that the problems had already been fixed, or that we did not close the issue. I recently check the data, and my update from last week should make everything clear. The raw folder is a mess, but the CLDF data is not.
As far as I am aware, no Zhang-2019-122 is in the concepticon, the list is only in etc in this repo, and should be removed from there.
I just did that.
So this datasets can be considered to have been fixed and brought up to Concepticon 3.0.
So at first sight the automatic mapping is not very good, the coverage is low: 17/122 14% . Do we need manual mapping? @LinguList