lingpy / calc-workflow

Workflows for computer-assisted language comparison: State of the Art
GNU General Public License v3.0
1 stars 1 forks source link

Computer-Assisted Language Comparison: State of the Art

This repository contains the python scripts of the CALC-workflow and the tutorial. The CALC-workflow was presented at Workshop: Workshop : Recent Advances in Comparative Linguistic Reconstruction (London, UK), therefore, the presentation slides can be found in the workshop> slides folder.

Data

You can find data in the workflow folder.

File Stage
D_Chen_subset.tsv Raw data to machine readable data
D_Chen_partial.tsv Machine readable data to partial cognate
D_Chen_structure.tsv Partial cognate to template-based alignment
D_Chen_crossids.tsv Template-based alignment to strict cognates
D_Chen_patterns.tsv Strict cognates to Sound correspondence pattern
File Supplementary
D_Doculects.tsv The metadata of the selected languages
profile2.tsv An example of orthography profile for the template-based alignment

Python codes

You can find all the python scripts in the workflow folder.

Python script Functions Manuscript
1_lifting.py From raw data to tokenized data 3.2.1
2_partial.py From tokenized data to cognate sets 3.2.2
3_alignment.py From cognate sets to alignments 3.2.3
4_crosssemantic.py From alignments to cross-semantic cognates 3.2.4
5_correspondence.py From ross-semantic cognates to sound correspondence patterns 3.2.5
P_validate.py The python script to generate a distance matrix Discussion

Handout

You can find handout.pdf in the workshop > handout folder.

Presentation slides

The presenation slide is in HTML format, please find slides.html in the workshop > slides folder.

Interactive web-based application