Closed LinguList closed 2 years ago
Essentially, you can see the first simple method here:
This method searches for cognates and then separates those which occur in different language families.
To do this, you need a wordlist. You can load the wordlist into a lexstat object easily:
from lingpy import *
lex = LexStat.from_cldf("cldf/cldf-metadata.json", columns=["language_name", "concept_name", "value", "form", "segments", "language_subgroup", "language_family"])
If this does not work, please check the "columns" and the "namespace" parameters of the from_cldf command in the documentation, as this clarifies the namespaces, which I'd have to look up as well.
From there, you could use the code I have shown here, @fractaldragonflies
The result (please adjust the for-loop in my example) will yield two cognate identiifers, one inside langauges families, one across, and setting all which do not go across languages to zero. So you can inspect the data conveniently in EDICTOR and search for interesting borrowings already.
Thanks Mattis
Created the lex style wordlist without problem after changing language_name => language_id.
Lots of activity in the analysis. Changed language_family for family.
But not sure what to do about KeyError: ‘ucogid’ when processing bcubes.
Not sure about fixing up the Table command either … i.e. I didn’t put in ‘args’ yet.
Hasta mañana!
John Miller @.***
On Aug 17, 2021, at 12:06 PM, Johann-Mattis List @.***> wrote:
The result (please adjust the for-loop in my example) will yield two cognate identiifers, one inside langauges families, one across, and setting all which do not go across languages to zero. So you can inspect the data conveniently in EDICTOR and search for interesting borrowings already.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/intercontinental-dictionary-series/keypano/issues/10#issuecomment-900477713, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIVSLTU6GJTRBKCPPK2ABJTT5KJKZANCNFSM5CKD34AQ. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email.
The b-cube don't work now, as you don't have a gold standard here, right? So just ignore this part.
Can you submit your code to a folder "scripts" in this repository, so we can check on this?
Done!
Created branch 'analyze'. Has folder scripts with the analyze.py script.
Created initial wordlist (analyze.pano.tsv) and resulting wordlist (analyzer.pano.result.tsv) from analyze are at level of keypano directory.
Commented out the bcube stuff and anything after depending on it.
Produced analysis. Used thresholds of 0.5 and 0.7.
Not sure how the 2 thresholds are treated in output.
I see several SCA ID variables, though nothing with Lex prefix, so suppose these are the groupings.
Nor which are internal family versus cross family comparisons.
Reviewed the analysis script and output some more.
Would like to discuss - online is fine.
Need to think about annotating borrowing as well for some development and test subsets in order to refine and test!
Basically, this requires to run the code on lingrex. Not very complicated. I'll provide an example later.