Open LinguList opened 2 years ago
This seems to work well.
from pycldf import Dataset
from tabulate import tabulate
schuessler = Dataset.from_metadata("cldf/cldf-metadata.json")
table = []
for form in schuessler.objects("FormTable"):
table += [[form.language.name, form.data["Chinese_Characters"], form.cldf.value, " ".join(form.cldf.segments), form.data["Syllable_Types"], form.data["Cognacy"]]]
print(tabulate(table[:10], headers=["Language", "Character", "Value", "Segments", "Type", "Cognates"]))
This yields:
Language Character Value Segments Type Cognates
---------------- ----------- ------- ------------- ------ ----------
Middle Chinese 阿 ʔâ ʔ â/a 1
Middle Chinese 阿 ʔjwo ʔ ɥ o 1
Late Han Chinese 阿 ʔɨɑ ʔ ɨ ɑ 1
Old Chinese 阿 *ʔa ʔ a B 1
Middle Chinese 阿奴 ʔâ-nuo ʔ â/a + n u o 1 2
Middle Chinese 哀 ʔậi ʔ ậi/ɑi 3
Late Han Chinese 哀 ʔəi ʔ əi 3
Old Chinese 哀 *ʔə̂i ʔ ə̂i/əi A 3
Middle Chinese 艾 ŋâiᶜ ŋ âi/ai ᶜ/³ 4
Late Han Chinese 艾 ŋɑs ŋ ɑ s 4
@nh36, does Schuessler mark A/B also on Middle Chinese, or do these serve to disambiguate vowels here?
Definitely no A/B in Middle Chinese. Schuessler uses Li Fang-Kuei's Middle Chinese without any change.
哀 ʔậi ʔ ậi/ʌi
艾 ŋâiᶜ ŋ âi/ɑi ᶜ/³
Thanks, so we need a language-specific orthoography profile then. No deal, but I need to extract it now.
@LinguList By the way, in Old Chinese a/ɑ but in LHan a/æ, and (I think) in Middle Chinese â/ɑ and a/æ
Okay, the more justified the use of language-specific profiles. Question: what is a with dot under it? there are quite a few cases in Schuessler...
@LinguList The dot under things should only come up in Middle Chinese. It means retroflex when it is under consonants (t, d, s, z, ts, dz) and it means a different quality of vowel when under a vowel. The example of ậ/ʌ we discussed above. I have checked and ạ (without the circumflex) seems not to occur in the file (except in Vietnamese, which is not relevant.)
Ah, good, it occurs only in Middle Chinese:
https://github.com/yinyunxue/schuesslerhanchinese/blob/main/etc/orthography/MiddleChinese.tsv#L34
And the other dots with consonants had been covered before, you can check the conversions in the profile for MC, etc.
Shall I correct that file by hand? There are still many problems with the vowels.
ậi | ậi/ɑi | 95 SHOULD BE ậi | ậi/ʌi | 95 ậ | ậ/ɑ | 55 SHOULD BE ậ | ậ/ʌ | 55 âi | âi/ai | 28 SHOULD BE âi | âi/ɑi | 28 âᶜ$ | â/a ᶜ/³ | 18 SHOULD BE âᶜ$ | â/ɑ ᶜ/³ | 18 â$ | â/a | 35 SHOULD BE â$ | â/ɑ | 35
I don't know what å should be, will need to look at some concrete examples.
If it is convenient, you can do so. I'd then re-run the conversion and we'd have all done. I have not yet implemented what you told me about the MC etc., so these errors are there because I did not look at them yet.
I just made those changes. But now it doesn't let me make any more changes because the file is not on a branch. Sorry for being a dumb ass.
"This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository."
I have looked into å and it is what Baxter does as æw
Ah, you probably did not commit to the main branch.
Yes, what you did when editing is: you copied the repository to your own user account, edited it there on a patch-1 branch, which is here: https://github.com/nh36/schuesslerhanchinese/tree/patch-1, and this we can now commit.
I just did so by creating a "pull-request", that suggests your updates to the main repository: https://github.com/yinyunxue/schuesslerhanchinese/pull/6
You can see your changes here: https://github.com/yinyunxue/schuesslerhanchinese/pull/6/files
If you are happy with those changes, we can "merge" and thus overwrite the old version of the profile.
Editing would just mean you edit on the github interface, which is easiest (tabstops cannot easily be types, but they are already there). Is this clear, how to edit files online?
Otherwise, I'd do another run tomorrow and could then output all data in this table form, so you can easily check.
These should be represented in an extra field