scottkleinman / lexos

Development repo for the Lexos API
MIT License
1 stars 0 forks source link

The Lexos API

GitHub release (latest SemVer) Python 3.9 Python wheels Code style: black license

The Lexos API is a library of methods for programmatically implementing and extending the functionality in the Lexos text analysis tool. Eventually, the web app will be rewritten to use the API directly. The goal of this alpha stage of development is to reproduce (and in some cases extend) the functionality of the current web app.

šŸ“– Documentation

A full discussion of the use of the API can be found on the Documentation website.

A suite of Jupyter notebooks demonstrating the functionality can be found here.

ā­ļø Features

  • Loads texts from a variety of sources.
  • Manages a corpus of texts.
  • Performs text pre-processing ("scrubbing") and splitting ("cutting").
  • Performs tokenization and trains language models using spaCy.
  • Creates assorted visualizations of term vectors.
  • Generates topic models and topic model visualizations using MALLET and dfr-browser.
  • An expanded set of features is planned for the future.

    ā³ Installation

    pip install lexos

    To update to the latest version, use

    pip install -U lexos

    Before using Lexos, you will want to install its default language model:

    python -m spacy download xx_sent_ud_sm

    This is a minimal model that performs sentence and token segmentation for a variety of languages. If you want a model for a specific language, such as English, download it by providing the name of the model:

    python -m spacy download en_core_web_sm

    For information on how Lexos uses language models, see Tokenizing Texts.

    If you are working in another language or need a larger language model, you can download instructions for additional models from the spaCy models page.

    šŸ’ Contribute