relatio-nlp / relatio

code base for constructing narrative statements from text
MIT License
96 stars 27 forks source link

relatio

A Python package to extract underlying narrative statements from text.

What can this package do?

  1. Identify Agent-Verb-Patient (AVP) / Subject-Verb-Object (SVO) triplets in the text

    • AVPs are obtained via Semantic Role Labeling.
    • SVOs are obtained via Dependency Parsing.
    • A concrete example of AVP/SVO extraction:

    Original sentence: "Taxes kill jobs and hinder innovation."

    Triplets: [('taxes', 'kill', 'jobs'), ('taxes','hinder','innovation')]

  2. Group agents and patients into interpretable entities in two ways:

    • Supervised classification of entities. Simply provide a list of entities and we will filter the triplets for you (e.g., ['Barack Obama', 'government', ...]).
    • Unsupervised classification via clustering of entities. We represent agents and patients as text embeddings and cluster them via KMeans or HDBSCAN. The optimal number of topics is data-driven.
    • A concrete example of a cluster:

    Interpretable entity: "tax"
    Related phrases: ['income tax', 'the tax rates', 'taxation in this country', etc.]

  3. Visualize clusters and resulting narratives.

We currently support French and English out-of-the-box. You can also provide us with a custom SVO-extraction function for any language supported by spaCy.

Installation

Runs on Linux and macOS (x86 platform) and it requires Python 3.7 (or 3.8) and pip.
It is highly recommended to use a virtual environment (or conda environment) for the installation.

# upgrade pip, wheel and setuptools
python -m pip install -U pip wheel setuptools

# install the package
python -m pip install -U relatio

In case you want to use Jupyter make sure that you have it installed in the current environment.

Quickstart

Please see our hands-on tutorials:

Team

relatio is brought to you by

with a special thanks for support of ETH Scientific IT Services.

If you are interested in contributing to the project please read the Development Guide.

Disclaimer

Remember that this is a research tool :)