lexibank / baf2

Bangime and Friends 2
Creative Commons Attribution 4.0 International
0 stars 0 forks source link

First statistics on borrowing patterns #1

Closed LinguList closed 2 years ago

LinguList commented 4 years ago

@IndianaTones please check this out: I added a sheet with the major borrowing statistics derived from your current annotation, you find it in the folder you shared, it is called "borrowing patterns". There, I provide a short classification of borrowings by family, with additional statistics. It also points to problems in the annotation (not many).

The code is very simple, you download the file from edictor (click "save" first then "download", save file as bangime.tsv), then run this code:

from lingpy import *

wl = Wordlist('bangime.tsv')

etd = wl.get_etymdict(ref='borid')

families = sorted(set([wl[idx, 'family'] for idx in wl]))

table = []
for cogid, refs in etd.items():
    if cogid not in ['0', 0]:
        idxs = []
        for ref in refs:
            if ref:
                idxs += ref
        concept = wl[idxs[0], 'concept']
        famis = [wl[idx, 'family'] for idx in idxs]

        count = str(len(set(famis)))
        ptn = ' '.join(['1' if f in famis else '0' for f in families])
        table += [[concept]+[
            str(famis.count(f)) for f in families]+[str(cogid), ptn, count,
                str(len(idxs))]]

with open('patterns.tsv', 'w') as f:
    f.write('Concept\t'+'\t'.join(families)+'\tBORROWING\tPATTERN\tFAMILIES\tREFLEXES\n')
    for line in sorted(table, key=lambda x: (x[-2], x[-3])):
        f.write('\t'.join(line)+'\n')

This will produce a TSV file. Now, please have a quiet look at the patterns we find there and decide if we can just use those plus manual analysis, to write the paper. If more analyses are needed, I will look into some fancy plots later.