thomjur / PyCollocation

Python module to do simple collocation analysis of a corpus.
GNU General Public License v3.0
0 stars 1 forks source link

PyCollocation

Python module to do simple collocation analysis of a corpus.

Requirements

Since we have implemented some tokenizer from nltk.tokenize, one needs to install the corresponding packages first.

import nltk
nltk.download('punkt')

Tutorial

CLI

You can use PyCollocation as via CLI. Example:

python3 analysis.py FOLDER SEARCH_TERM L_WINDOW R_WINDOW STATISTIC DOC_TYPE OUTPUT_FORMAT
python3 analysis.py /corpora "test" 3 3 freq folder print

FOLDER

The relative path to the folder including the files to be analyzed.

SEARCH_TERM

The search term. Can be a string or a regular expression.

L_WINDOW

Window size left of the search term.

R_WINDOW

Window size right of the search term.

STATISTIC

The statistical measure for the analysis. Currently implemented:

DOC_TYPE

The document type. Can be:

OUTPUT_FORMAT

Can be: