lexibank / baf2

Bangime and Friends 2
Creative Commons Attribution 4.0 International
0 stars 0 forks source link

CLDF dataset underlying the study "First steps towards the detection of contact layers in Bangime: a multi-disciplinary, computer-assisted approach" from 2022

CLDF validation

How to cite

If you use these data please cite

Description

This dataset is licensed under a CC-BY-4.0 license

Available online at http://digling.org/links/bangime.html

Conceptlists in Concepticon:

Analysis of Bangime and Friends (2)

The data in EDICTOR can be accessed from https://digling.org/links/bangime.html.

To run the analysis, make sure to install all requirements:

pip install -e ".[full]"

Also make sure to clone all repositories of Concepticon, Glottolog, and CLTS:

mkdir repos
cd repos
git clone https://github.com/glottolog/glottolog.git
git clone https://github.com/concepticon/concepticon-data.git
git clone https://github.com/cldf-clts/clts

The data is annotated with the help of the EDICTOR tool, where you can also inspect it using the link https://digling.org/edictor/http://digling.org/edictor/?remote_dbase=bangime&file=bangime.

To download the most recent version of the data programmatically, type:

cldfbench download lexibank_baf2.py

In order to convert the updated data to cldf, run:

cldfbench lexibank.makecldf lexibank_baf2.py --concepticon-version=v3.2.0 --glottolog-version=v5.0 --clts-version=v2.3.0

In order to run the cognate and borrowing detection analysis, run:

cldfbench baf2.borrowing

This analysis will create a file wordlist.tsv in the folder analysis. Note that the analysis itself was only done once in the beginning of our investigation and later manually updated. As a result, the results of this comparison necessarily differ from the results of the manually updated version.

To analyze the data, you can first compute average statistics of borrowed items:

cldfbench baf2.average

This will create a file relations.md in the folder analysis.

To count shared borrowing candidates, type:

cldfbench baf2.count

This will create a file analysis/patterns.tsv.

To yield the same for all language subgroups in the sample, type:

cldfbench baf2.count-subgroup

This will write the patterns to the file analysis/patterns-subgroups.tsv.

To yield the same for all languages in the sample, type:

cldfbench baf2.count-language

This will write the patterns to the file analysis/patterns-subgroups.tsv.

Statistics

CLDF validation Glottolog: 100% Concepticon: 97% Source: 100% BIPA: 100% CLTS SoundClass: 100%

Contributors

Name GitHub user Description Role
Abbie Hantgan IndianaTones Data collection, orthography Author
Hiba Babiker Data collection, orthography Author
Johann-Mattis List @LinguList maintainer Author, Editor

CLDF Datasets

The following CLDF datasets are available in cldf: