PyCollocation

Python module to do simple collocation analysis of a corpus.

Requirements

Since we have implemented some tokenizer from nltk.tokenize, one needs to install the corresponding packages first.

import nltk
nltk.download('punkt')

You can use PyCollocation as via CLI. Example:

python3 analysis.py FOLDER SEARCH_TERM L_WINDOW R_WINDOW STATISTIC DOC_TYPE OUTPUT_FORMAT

python3 analysis.py /corpora "test" 3 3 freq folder print

The relative path to the folder including the files to be analyzed.

The search term. Can be a string or a regular expression.

Window size left of the search term.

Window size right of the search term.

The statistical measure for the analysis. Currently implemented:

The document type. Can be:

Can be: