intercontinental-dictionary-series / keypano

IDS data on Panoan languages coded by Key
Creative Commons Attribution 4.0 International
0 stars 0 forks source link

Add LOAN field from keypano/cldf/forms.csv to Edictor SQL DB #16

Closed fractaldragonflies closed 3 years ago

fractaldragonflies commented 3 years ago

While hardly perfect, as we’ve observed, it does at least provide some external evidence. We've corrected processing of the IDS borrowed word annotation, so that the LOAN field correctly reflects IDS annotation. (e.g., Processing of fields like [ku'či]; c̷upika yu-tai, are handled correctly.)

Only 778 of the non-Spanish and non-Portuguese words were classified as borrowed by IDS, which we know is and undercount.

LinguList commented 3 years ago

So, @fractaldragonflies, can you update me please: the criterion for having a "loan" is simply that it is inside the brackets, right? Can we just add this information to the lexibank_keypano.py script? There's the add_form(..., Loan=[True, False]) option which could be easily triggered? When done so, one can include this field via raw/preprocessing.py (if you checked this, with info in NOTES.md) or via the commandline of pyedictor.

fractaldragonflies commented 3 years ago

I had previously added to the lexibank_keypano.py script (Sep 9, with the analyze2 commit and merge). See the test_borrowed function of cmd_download. I had verified that it works by running cldfbench, oops - I believe, the 'download' command which ran the lexibank_keypano.py cmd_download. For this subsequent make_cldf also constructs the cldf DB correctly.

fractaldragonflies commented 3 years ago

The test_borrowed returns 0, 1 values as part of the download. It is with the make_cldf command that the 0, 1 becomes False, True.

LinguList commented 3 years ago

Okay, so it can be included in edictor via pyedictor either via the namespace:

$ pyedictor wordlist --addon=loan:loan 

Or one includes it in preprocessing.py

Do you want to have a look at this, or should I do it?

fractaldragonflies commented 3 years ago

I’ll give it a go. Prepping for meeting with Cesar now. But should be able to update my repository, and try my hand at creating the sqlite DB this evening.

See you later!

John Miller @.***

On Oct 18, 2021, at 1:05 PM, Johann-Mattis List @.***> wrote:

Okay, so it can be included in edictor via pyedictor either via the namespace:

$ pyedictor wordlist --addon=loan:loan Or one includes it in preprocessing.py

Do you want to have a look at this, or should I do it?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/intercontinental-dictionary-series/keypano/issues/16#issuecomment-946029497, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIVSLTQHIIZMXCVHHIJG3ZDUHROXFANCNFSM5GABOBKQ. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

fractaldragonflies commented 3 years ago

I wasn’t able to get this to work. I tried:

% edictor wordlist --preprocessing=raw/preprocessing.py --addon=language_family:family,loan:loan --sqlite --name=keypano

and

% edictor wordlist --preprocessing=raw/preprocessing.py --addon=language_family:family —addon=loan:loan --sqlite --name=keypano

and changing preprocessing.py as shown here:

wordlist.add_entries("loan", "loan", lambda x: x)

alms = Alignments(wordlist, ref="cogids", transcription="form")
alms.align()

D = {0: [
    "doculect",
   ...
    "loan",

And none succeed.
i.e. the 1st executes, but there doesn’t seem to be any loan data. [Or maybe I just don’t know how to look for it.]

And the last errors, maybe for having the lambda adding to the same entry?

D = preprocessing(wordlist)

File "raw/preprocessing.py", line 15, in run wordlist.add_entries("loan", "loan", lambda x: x) File "/Users/johnmiller/opt/miniforge3/envs/lingsaphon/lib/python3.9/site-packages/lingpy/basic/wordlist.py", line 145, in add_entries self._add_entries(entry, source, function, override, keywords) File "/Users/johnmiller/opt/miniforge3/envs/lingsaphon/lib/python3.9/site-packages/lingpy/basic/parser.py", line 391, in _add_entries _apply(key, self[key][idx], keywords) IndexError: list index out of range

John Miller @.***

On Oct 18, 2021, at 1:05 PM, Johann-Mattis List @.***> wrote:

Okay, so it can be included in edictor via pyedictor either via the namespace:

$ pyedictor wordlist --addon=loan:loan Or one includes it in preprocessing.py

Do you want to have a look at this, or should I do it?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/intercontinental-dictionary-series/keypano/issues/16#issuecomment-946029497, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIVSLTQHIIZMXCVHHIJG3ZDUHROXFANCNFSM5GABOBKQ. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

LinguList commented 3 years ago

I think you need to modify BOTH.

The command first reads in all data, including --addon data, then preprocessing.py modifies the output, and it would just ignore a loan column, even if there was one ;)

LinguList commented 3 years ago

To check what happens, omit --sqlite, this produces a file named keypano.tsv, which has a header, that you may check (you can even load this into edictor directly or open it with lingpy).

fractaldragonflies commented 3 years ago

I loaded the keypad.sqlite3 db to main. As you suggested it was the combination of a change to preprocessing.py and the command line. In preprocessing.py, I only need to add 'loan' to the list of fields. Adding a lambda function was not necessary and elicits a prompt to overwrite the variable. Still works but not necessary.

For the command I used: edictor wordlist --preprocessing=raw/preprocessing.py --addon=language_family:family,loan:loan --sqlite --name=keypano

I looked up the command line processing for the addon field to be verify the use of comma delimited list of addons. Good.

I'm closing this issue as resolved.