Closed LinguList closed 1 year ago
I have the cartopy example that we used to display maps of Saphon. It used min and max long and lat to define borders, but this could be set instead. It also reads the languageTable as cldf object for lat and long... suppose if we add # borrowed it would be accessible as well. Might that be sufficient for the task?
Yes, but cldfviz is like this:
cldfbench cldfviz.map cldf/cldf-metadata.json --language-labels --language-properties=Spanish_Borrowings
Problem now is that I added Spanish_Borrowings as float, but cldf defines it as a string (I did not change this). So we best make something like modifying the Spanish_Borrowings to some categories, like < 10% < 15% < 20% < 25% < 30% Then we have a decent color map as well.
So my assumption is that this will be faster, and you learn another nice tool ;)
I modified the code to add the information on the percentage of borrowings from Spanish as a category. This means then, I hope, we can have some nicer classification.
But if you want a more customized map, we can go with our old example as
well! The info on borrowings is now anyway in the languages.csv
file,
so it can be readily accessed. If you want to go ahead (but if it takes
too long, leave it please) I'd later double check.
cldfbench cldfviz.map cldf/cldf-metadata.json --language-labels --language-properties=Borrowing_Class --markersize=40 --base-layer Esri_WorldPhysical
Great!! Very pretty too.
To get the labels to show, should I edit the .png and overlay the labels on the South Pacific? Or is there a way to place this directly on the map... similar to what MatLab allows for figure placement?
Computed borrowing in lexibank_sabor.py doesn't take into account borrowed_score, and so the numbers are higher than what we get when we get_our_data. I think the borrowing class stays within the indicated range so no real impact on the graphic. I'll add qualify for borrowed_score too ... since form is accessed in the same phrase, it should be straight-forward.
[1 for form in language.forms_with_sounds if borrowings.get(
form.id[5:], [""])[0] == "Spanish"])
Can you modify he lexibank script code in this regard? I know I did not follow your borrowing definition, I did not have time to look it up.
Will do. Working from Cusco for a few days, so a bit less effectively for multiple reasons!
Changed accounting for borrowing in make_cldf command.
Was more 'interesting' than I thought it would be since overall forms are censored for whether they have concepticon_gloss. So I counted only forms with concepticon gloss and used this as the base for borrowed proportion calculation. Now calculation of proportion of borrowed is consistent with results return for such calculation over the wordlist by language (and overall).
I can of course reverse this if this consistency is less important than the gross measure of borrowing. I did want to at least examine discrepancy in form counts, which I report here. There is more difference due to censoring for not having concepticon gloss than for our borrowing score definition.
I printed out the different counts of all forms versus forms with concepticon glosses during make_clf:
cldfbench lexibank.makecldf lexibank_sabor.py
INFO running _cmd_makecldf on sabor ...
INFO loaded borrowings
INFO:lingpy:loading wold
loading forms for wold: 100%|█████████████████████████████████████████████████████████████████| 64289/64289 [00:04<00:00, 12990.95it/s]
INFO:lingpy:loading ids
loading forms for ids: 100%|████████████████████████████████████████████████████████████████| 454145/454145 [00:08<00:00, 51723.55it/s]
INFO:lingpy:loaded wordlist with 1489 concepts and 370 languages
INFO added ['ids-Spanish']
======
Added: name Yaqui, language wold-Yaqui
INFO Yaqui all forms 1615, forms with concepts 1433, borrowed 311, prop 0.21702721563154223; forms no concepts 7
Added: name Zinacantán Tzotzil, language wold-ZinacantanTzotzil
INFO Zinacantán Tzotzil all forms 1413, forms with concepts 1266, borrowed 165, prop 0.13033175355450238; forms no concepts 1
Added: name Q'eqchi', language wold-Qeqchi
INFO Q'eqchi' all forms 1995, forms with concepts 1773, borrowed 161, prop 0.09080654258319233; forms no concepts 2
Added: name Otomi, language wold-Otomi
INFO Otomi all forms 2558, forms with concepts 2241, borrowed 198, prop 0.08835341365461848; forms no concepts 2
Added: name Imbabura Quechua, language wold-ImbaburaQuechua
INFO Imbabura Quechua all forms 1319, forms with concepts 1156, borrowed 300, prop 0.25951557093425603; forms no concepts 23
Added: name Wichí, language wold-Wichi
INFO Wichí all forms 1361, forms with concepts 1219, borrowed 152, prop 0.12469237079573421; forms no concepts 1
Added: name Mapudungun, language wold-Mapudungun
INFO Mapudungun all forms 1412, forms with concepts 1242, borrowed 190, prop 0.1529790660225443; forms no concepts 48
=====
INFO file written: /Users/johnmiller/ling/sabor-installs/sabor/cldf/.transcription-report.json
INFO Summary for dataset /Users/johnmiller/ling/sabor-installs/sabor/cldf/cldf-metadata.json
- **Varieties:** 8
- **Concepts:** 1,308
- **Lexemes:** 12,100
- **Sources:** 0
- **Synonymy:** 1.30
- **Invalid lexemes:** 0
- **Tokens:** 72,550
- **Segments:** 112 (0 BIPA errors, 0 CTLS sound class errors, 112 CLTS modified)
- **Inventory size (avg):** 39.38
Here is the snippet of code from lexibank_sabor.py that counts the number of forms:
borrowed = sum(
[1 for form in language.forms_with_sounds
if borrowings.get(form.id[5:], [""])[0] == "Spanish" and
float(form.data["Borrowed_score"]) > BOR_CRITICAL_VALUE and
form.concept and form.concept.concepticon_gloss in concepts])
Based on the original code creating forms.
Nice. I can redo the map in HTML and we can use that in some form in the paper, adding larger labels manually, maybe.
In order to do that, we need to do the following:
etc/languages.tsv
or on the fly tocldf/languages.csv
The only question is what to make with Spanish in our sample. We want a map that showing only a specific region. If one does a HTML map, one can make a screenshot, which may be good enough. If not, one needs to define the borders.