nickduran / align-linguistic-alignment

Python library for extracting quantitative, reproducible metrics of multi-level alignment between speakers in naturalistic language corpora.
MIT License
40 stars 12 forks source link
conversation-analysis corpus-tools linguistic-alignment linguistic-analysis ngram-analysis nltk notebooks python text-analysis word2vec

ALIGN, a computational tool for multi-level language analysis (optimized for Python 3.10)

align is a Python library for extracting quantitative, reproducible metrics of multi-level alignment between two speakers in naturalistic language corpora. The method was introduced in "ALIGN: Analyzing Linguistic Interactions with Generalizable techNiques" (Duran, Paxton, & Fusaroli, 2019; Psychological Methods).

Examples of papers relying on the ALIGN library:

Installation

align may be downloaded directly using pip.

To download the stable version released on PyPI:

pip install align

Or to update:

pip install align --upgrade

And it's always good practice to install a package like align, which has several dependencies (see requirements.txt), in a virtual environment.

Anaconda users: The above should work in the vast majority of cases. However, if you prefer an easy way to install align within a virtual environment in one go, or you are experiencing problems with trying to update align, a YAML file has been provided to streamline things. Just follow these simple steps:

  1. Download the environment.yml file and navigate to the folder where it has been downloaded
  2. Run the following command in Terminal: conda env create -f environment.yml
  3. Be sure to activate the new enviroment (i.e., conda activate align0.1.1) before running any align analyses (such as the tutorials; see below)

If you experience any problems, please put them in the "Issues" section of this repository.

Quick documentation

ALIGN consists of two primary modules for conducting analyses, prepare_transcripts and calculate_alignment. To get a quick glance of the functions contained within each module, please check out the following:

Additional tools required for some align options

The Google News pre-trained word2vec vectors (GoogleNews-vectors-negative300.bin) and the Stanford part-of-speech tagger (stanford-postagger-full-2020-11-17) are required for some optional align parameters but must be downloaded separately. Please see the tutorials for more information.

Tutorials

We created Jupyter Notebook tutorials to provide an easily accessible step-by-step walkthrough on how to use align. Below are descriptions of the current tutorials that can be found in the examples directory within this repository. If unfamiliar with Jupyter Notebooks, instructions for installing and running can be found here: http://jupyter.org/install. We recommend installing Jupyter using Anaconda. Anaconda is a widely-used Python data science platform that helps streamline workflows.

We are in the process of adding more tutorials and would welcome additional tutorials by interested contributors.

Attribution

If you find the package useful, please cite our manuscript:

Duran, N., Paxton, A., & Fusaroli, R. (2019). ALIGN: Analyzing Linguistic Interactions with Generalizable techNiques. Psychological Methods. http://dynamicog.org/papers/

Licensing of example data