intercontinental-dictionary-series / keypano

IDS data on Panoan languages coded by Key
Creative Commons Attribution 4.0 International
0 stars 0 forks source link

make seabor analysis #10

Closed LinguList closed 2 years ago

LinguList commented 3 years ago

Basically, this requires to run the code on lingrex. Not very complicated. I'll provide an example later.

LinguList commented 3 years ago

Essentially, you can see the first simple method here:

https://github.com/lexibank/seabor/blob/f6e16cfda3393970a50aed38cac1e9d48ef810d4/seaborcommands/fullcomparison.py#L22-L53

This method searches for cognates and then separates those which occur in different language families.

To do this, you need a wordlist. You can load the wordlist into a lexstat object easily:

from lingpy import *
lex = LexStat.from_cldf("cldf/cldf-metadata.json", columns=["language_name", "concept_name", "value", "form", "segments", "language_subgroup", "language_family"])

If this does not work, please check the "columns" and the "namespace" parameters of the from_cldf command in the documentation, as this clarifies the namespaces, which I'd have to look up as well.

From there, you could use the code I have shown here, @fractaldragonflies

LinguList commented 3 years ago

The result (please adjust the for-loop in my example) will yield two cognate identiifers, one inside langauges families, one across, and setting all which do not go across languages to zero. So you can inspect the data conveniently in EDICTOR and search for interesting borrowings already.

fractaldragonflies commented 3 years ago

Thanks Mattis

Created the lex style wordlist without problem after changing language_name => language_id.

Lots of activity in the analysis. Changed language_family for family.

But not sure what to do about KeyError: ‘ucogid’ when processing bcubes.

Not sure about fixing up the Table command either … i.e. I didn’t put in ‘args’ yet.

Hasta mañana!

John Miller @.***

On Aug 17, 2021, at 12:06 PM, Johann-Mattis List @.***> wrote:

The result (please adjust the for-loop in my example) will yield two cognate identiifers, one inside langauges families, one across, and setting all which do not go across languages to zero. So you can inspect the data conveniently in EDICTOR and search for interesting borrowings already.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/intercontinental-dictionary-series/keypano/issues/10#issuecomment-900477713, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIVSLTU6GJTRBKCPPK2ABJTT5KJKZANCNFSM5CKD34AQ. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email.

LinguList commented 3 years ago

The b-cube don't work now, as you don't have a gold standard here, right? So just ignore this part.

Can you submit your code to a folder "scripts" in this repository, so we can check on this?

fractaldragonflies commented 3 years ago

Done!

fractaldragonflies commented 3 years ago

Reviewed the analysis script and output some more.

Would like to discuss - online is fine.

Need to think about annotating borrowing as well for some development and test subsets in order to refine and test!