yinyunxue / schuesslerhanchinese

CLDF dataset derived from Schuessler's "Chinese Etymological Dictionary" from 2007
Creative Commons Attribution 4.0 International
3 stars 1 forks source link

representation of type a / b syllables in Old Chinese #5

Open LinguList opened 2 years ago

LinguList commented 2 years ago

These should be represented in an extra field

LinguList commented 2 years ago

This seems to work well.

from pycldf import Dataset
from tabulate import tabulate

schuessler = Dataset.from_metadata("cldf/cldf-metadata.json")
table = []
for form in schuessler.objects("FormTable"):
    table += [[form.language.name, form.data["Chinese_Characters"], form.cldf.value, " ".join(form.cldf.segments), form.data["Syllable_Types"], form.data["Cognacy"]]]
print(tabulate(table[:10], headers=["Language", "Character", "Value", "Segments", "Type", "Cognates"]))
LinguList commented 2 years ago

This yields:

Language          Character    Value    Segments       Type    Cognates
----------------  -----------  -------  -------------  ------  ----------
Middle Chinese    阿           ʔâ       ʔ â/a                  1
Middle Chinese    阿           ʔjwo     ʔ ɥ o                  1
Late Han Chinese  阿           ʔɨɑ      ʔ ɨ ɑ                  1
Old Chinese       阿           *ʔa      ʔ a            B       1
Middle Chinese    阿奴         ʔâ-nuo   ʔ â/a + n u o          1 2
Middle Chinese    哀           ʔậi      ʔ ậi/ɑi                3
Late Han Chinese  哀           ʔəi      ʔ əi                   3
Old Chinese       哀           *ʔə̂i     ʔ ə̂i/əi        A       3
Middle Chinese    艾           ŋâiᶜ     ŋ âi/ai ᶜ/³            4
Late Han Chinese  艾           ŋɑs      ŋ ɑ s                  4
LinguList commented 2 years ago

@nh36, does Schuessler mark A/B also on Middle Chinese, or do these serve to disambiguate vowels here?

nh36 commented 2 years ago

Definitely no A/B in Middle Chinese. Schuessler uses Li Fang-Kuei's Middle Chinese without any change. 哀 ʔậi ʔ ậi/ʌi
艾 ŋâiᶜ ŋ âi/ɑi ᶜ/³

LinguList commented 2 years ago

Thanks, so we need a language-specific orthoography profile then. No deal, but I need to extract it now.

nh36 commented 2 years ago

@LinguList By the way, in Old Chinese a/ɑ but in LHan a/æ, and (I think) in Middle Chinese â/ɑ and a/æ

LinguList commented 2 years ago

Okay, the more justified the use of language-specific profiles. Question: what is a with dot under it? there are quite a few cases in Schuessler...

nh36 commented 2 years ago

@LinguList The dot under things should only come up in Middle Chinese. It means retroflex when it is under consonants (t, d, s, z, ts, dz) and it means a different quality of vowel when under a vowel. The example of ậ/ʌ we discussed above. I have checked and ạ (without the circumflex) seems not to occur in the file (except in Vietnamese, which is not relevant.)

LinguList commented 2 years ago

Ah, good, it occurs only in Middle Chinese:

https://github.com/yinyunxue/schuesslerhanchinese/blob/main/etc/orthography/MiddleChinese.tsv#L34

And the other dots with consonants had been covered before, you can check the conversions in the profile for MC, etc.

nh36 commented 2 years ago

Shall I correct that file by hand? There are still many problems with the vowels.

ậi | ậi/ɑi | 95 SHOULD BE ậi | ậi/ʌi | 95 ậ | ậ/ɑ | 55 SHOULD BE ậ | ậ/ʌ | 55 âi | âi/ai | 28 SHOULD BE âi | âi/ɑi | 28 âᶜ$ | â/a ᶜ/³ | 18 SHOULD BE âᶜ$ | â/ɑ ᶜ/³ | 18 â$ | â/a | 35 SHOULD BE â$ | â/ɑ | 35

I don't know what å should be, will need to look at some concrete examples.

LinguList commented 2 years ago

If it is convenient, you can do so. I'd then re-run the conversion and we'd have all done. I have not yet implemented what you told me about the MC etc., so these errors are there because I did not look at them yet.

nh36 commented 2 years ago

I just made those changes. But now it doesn't let me make any more changes because the file is not on a branch. Sorry for being a dumb ass.

"This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository."

I have looked into å and it is what Baxter does as æw

LinguList commented 2 years ago

Ah, you probably did not commit to the main branch.

LinguList commented 2 years ago

Yes, what you did when editing is: you copied the repository to your own user account, edited it there on a patch-1 branch, which is here: https://github.com/nh36/schuesslerhanchinese/tree/patch-1, and this we can now commit.

LinguList commented 2 years ago

I just did so by creating a "pull-request", that suggests your updates to the main repository: https://github.com/yinyunxue/schuesslerhanchinese/pull/6

LinguList commented 2 years ago

You can see your changes here: https://github.com/yinyunxue/schuesslerhanchinese/pull/6/files

LinguList commented 2 years ago

If you are happy with those changes, we can "merge" and thus overwrite the old version of the profile.

LinguList commented 1 year ago

Editing would just mean you edit on the github interface, which is easiest (tabstops cannot easily be types, but they are already there). Is this clear, how to edit files online?

Otherwise, I'd do another run tomorrow and could then output all data in this table form, so you can easily check.