Closed fractaldragonflies closed 3 years ago
So, @fractaldragonflies, can you update me please: the criterion for having a "loan" is simply that it is inside the brackets, right? Can we just add this information to the lexibank_keypano.py script? There's the add_form(..., Loan=[True, False])
option which could be easily triggered? When done so, one can include this field via raw/preprocessing.py
(if you checked this, with info in NOTES.md) or via the commandline of pyedictor.
I had previously added to the lexibank_keypano.py script (Sep 9, with the analyze2 commit and merge). See the test_borrowed function of cmd_download. I had verified that it works by running cldfbench, oops - I believe, the 'download' command which ran the lexibank_keypano.py cmd_download. For this subsequent make_cldf also constructs the cldf DB correctly.
The test_borrowed returns 0, 1 values as part of the download. It is with the make_cldf command that the 0, 1 becomes False, True.
Okay, so it can be included in edictor via pyedictor either via the namespace:
$ pyedictor wordlist --addon=loan:loan
Or one includes it in preprocessing.py
Do you want to have a look at this, or should I do it?
I’ll give it a go. Prepping for meeting with Cesar now. But should be able to update my repository, and try my hand at creating the sqlite DB this evening.
See you later!
John Miller @.***
On Oct 18, 2021, at 1:05 PM, Johann-Mattis List @.***> wrote:
Okay, so it can be included in edictor via pyedictor either via the namespace:
$ pyedictor wordlist --addon=loan:loan Or one includes it in preprocessing.py
Do you want to have a look at this, or should I do it?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/intercontinental-dictionary-series/keypano/issues/16#issuecomment-946029497, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIVSLTQHIIZMXCVHHIJG3ZDUHROXFANCNFSM5GABOBKQ. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
I wasn’t able to get this to work. I tried:
% edictor wordlist --preprocessing=raw/preprocessing.py --addon=language_family:family,loan:loan --sqlite --name=keypano
and
% edictor wordlist --preprocessing=raw/preprocessing.py --addon=language_family:family —addon=loan:loan --sqlite --name=keypano
and changing preprocessing.py as shown here:
wordlist.add_entries("loan", "loan", lambda x: x)
alms = Alignments(wordlist, ref="cogids", transcription="form")
alms.align()
D = {0: [
"doculect",
...
"loan",
And none succeed.
i.e. the 1st executes, but there doesn’t seem to be any loan data. [Or maybe I just don’t know how to look for it.]
And the last errors, maybe for having the lambda adding to the same entry?
D = preprocessing(wordlist)
File "raw/preprocessing.py", line 15, in run wordlist.add_entries("loan", "loan", lambda x: x) File "/Users/johnmiller/opt/miniforge3/envs/lingsaphon/lib/python3.9/site-packages/lingpy/basic/wordlist.py", line 145, in add_entries self._add_entries(entry, source, function, override, keywords) File "/Users/johnmiller/opt/miniforge3/envs/lingsaphon/lib/python3.9/site-packages/lingpy/basic/parser.py", line 391, in _add_entries _apply(key, self[key][idx], keywords) IndexError: list index out of range
John Miller @.***
On Oct 18, 2021, at 1:05 PM, Johann-Mattis List @.***> wrote:
Okay, so it can be included in edictor via pyedictor either via the namespace:
$ pyedictor wordlist --addon=loan:loan Or one includes it in preprocessing.py
Do you want to have a look at this, or should I do it?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/intercontinental-dictionary-series/keypano/issues/16#issuecomment-946029497, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIVSLTQHIIZMXCVHHIJG3ZDUHROXFANCNFSM5GABOBKQ. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
I think you need to modify BOTH.
The command first reads in all data, including --addon
data, then preprocessing.py
modifies the output, and it would just ignore a loan
column, even if there was one ;)
To check what happens, omit --sqlite
, this produces a file named keypano.tsv
, which has a header, that you may check (you can even load this into edictor directly or open it with lingpy).
I loaded the keypad.sqlite3 db to main.
As you suggested it was the combination of a change to preprocessing.py
and the command line.
In preprocessing.py
, I only need to add 'loan' to the list of fields. Adding a lambda function was not necessary and elicits a prompt to overwrite the variable. Still works but not necessary.
For the command I used:
edictor wordlist --preprocessing=raw/preprocessing.py --addon=language_family:family,loan:loan --sqlite --name=keypano
I looked up the command line processing for the addon field to be verify the use of comma delimited list of addons. Good.
I'm closing this issue as resolved.
While hardly perfect, as we’ve observed, it does at least provide some external evidence. We've corrected processing of the IDS borrowed word annotation, so that the LOAN field correctly reflects IDS annotation. (e.g., Processing of fields like [ku'či]; c̷upika yu-tai, are handled correctly.)
Only 778 of the non-Spanish and non-Portuguese words were classified as borrowed by IDS, which we know is and undercount.