Next tasks - Githubissues

LinguList commented 5 years ago

First, thanks, this is an excellent job!

I mention a few things, we should do next, but we can do so in steps, no need to rush immediately:

process the data by adding the forms from the values in cldf/
make an orthography profile
make concepticon mappings and paste them in etc/concepts.tsv (only where possible, leave blanks where unclear)
apply the orthoprofile
publish the dataset
write a short post for our calc blog, introducing how the data was turned into cldf (I think the last point would be very nice, we can time it, once this is ready)

So we will start by making a first lexibank script. Maybe a good time to ask @MacyL to learn the new lexibank? There are simple templates for this. Let us discuss this next week (this week I barely have time).

For concepticon mapping, we ask @schweikhard to review.

laiyunfan commented 5 years ago

Ok, I will work on that next week on the tasks point to point and ask @MacyL about any stuff that I don't understand yet.

Wu-Urbanek commented 5 years ago

@laiyunfan I added a concepticon mapping result to etc/ folder (https://github.com/lexibank/zhanggyalrong/blob/master/etc/concepts.tsv ) Could you map the concepts to concepticon?

laiyunfan commented 5 years ago

Ok

Wu-Urbanek commented 5 years ago

@LinguList . Question about the "task 1".

The Old Chinese has various brackets ([], () and <>), do we keep them or remove them?
The raw data is mostly in IPA strings, and orthography profiles are already uploaded. So should I work on forms and tokens at the same time when I code the pylexibank script?

LinguList commented 5 years ago

Old Chinese has brackets, but @laiyunfan knows what they mean. They cannot be simply removed in the value, but in the form, they can be reduced. This will require a good orthography profile, as we will have uncertainties, that need to be handled by the orthoprofile.

LinguList commented 5 years ago

@MacyL, I prefer @laiyunfan to work on orthoprofiles first. Please use the extended annotation from lingpy, in which we have context in the orthography profile. Do you, @MacyL still remember how to create them? with lingpy profile ? To do so, we first need the lexibank_zhangrgyalrong.py script.

laiyunfan commented 5 years ago

I made those orthography profiles, but did not think of dealing with the Old Chinese brackets. Don't know what to do with them at the moment, now thinking about it.

LinguList commented 5 years ago

So, @laiyunfan, the normal way to deal with this is to first make a complete forms.csv in cldf folder, which we create with a script called lexibank_zhanggyalrong.py, which follows certain standards. Then we automatically create the profile with Python/lingpy, and then we correct it. So I'd say: first we need the Python script, and I suggest that @MacyL starts with a first draft (check out the chenhmongmien in the new lexibank) and I'll then correct it, and @laiyunfan will review the code to learn.

lexibank / zhangrgyalrong

Next tasks #1