Open Wu-Urbanek opened 5 years ago
updated an orthography profile which summarizes phonemes from Chen's book (data provided by Mattis)
can you point me to the file please?
Nice. What we need to do now is: group the identical cells on the left, and separate languages where we find them by a comma. So if you have a "k" in two languages, put the "k" first, and then the two languages in the next cell separated by a comman. In this way, we can reduce the items drastically.
BTW: we will also need to add these new language names from Chinese to the languages.tsv, which has our English names, so we can identify them. Making the IPA will be simple later on. Both Nathanael and I can help, and @Schweikhard can also learn typing IPA (he probably knows already).
A, wait: did you already do that, separating them things by a space? If so, we can start adding IPA right away. I'd just like to know what the *
on some entries means. Did you check this with the book?
In the profile I made, columns are separated by tabs. The first column is the phonetic symbol, the second column is the symbol appears the languages (separated by white space). I also wonder what the means. The book gives a summary of the symbols (but I didn't see the there), and then tables of phoneme inventories. I didn't check the tables. But I know that the “th“, ”dh“ are actually tʰ, dʰ
Okay, then we'll only need to add the specific Chinese language names to the other names we use. We should move this into lexibank, where we can more properly handle the segmentation.
Hi again, @MacyL, in fact, we need to do this in another way, as there was a misunderstanding. I want the following profile:
Graphemes | IPA | Strucutre |
---|---|---|
k | k | i |
ei | ei | n |
ek | e k | n c |
You see? Now in your listing, you do not provide this information. So my idea was in fact not to go and make a code that converts the list by Doug into a two-column thing, but to semi-manually turn this already into this kind of profile we need. This would also require that you compare for each language with the source, while doing this. It is easier than typing off, but we need more than parsing all in one file here. I would even say: keep all languages distinct for now, so start with one file per language. It's in fact the same what I did for the data in Liu2008, which I showed you.
Check this file as an example for the output I wanted. @Schweikhard may also help here, since the rules are similar for the preparation. We can meet on Thursday and I can explain more, why this is so important.
The python script is updated New orthography profiles are here : https://github.com/lingpy/calc-workflow/tree/master/code/Phoneme_Inventories Each file is a language's phoneme inventories, with 3 columns : Grapheme, IPA, Template The columns we need to fill in are IPA and Template.
Sorry, when I saw this, I realized it is still better to go with one big file. I still think it is good to have one phoneme inventory per doculect, but filling them out will be easier when you have one big file.
This means, I suggest, you proceed as follows:
So we still do per doculect, but in order to annotate, we do it for all at once, and THEN you split, and THEN you check with the book.
does that sound okay to you @MacyL ?
Updated the code. I keep the template as yesterday but add IPA and template columns. So it looks like this: Grapheme, IPA, Template, Note. The columns are all tab separated, the languages in the note are white space separated. https://github.com/lingpy/calc-workflow/blob/master/code/summarised_orthography.tsv
Okay, then you can start to add the CLTS/IPA items. Remember to look at my example, as it contains many hints on how this should be done. If @Schweikhard has time, you could also look at this quickly, and maybe do a check for consistency, once @MacyL is finished? Then I'd check it last.
One thing I found in Chen's book :
Or shows in loanwords For example, page 261 高坡
A summary of Hmong-Mien languages phoneme inventories is provided from page 50 to page 68. I have typed page 50 to page 54 so far. See the progress in code/Chen_book_orthography.csv