julien-blanchard / texTract

a lightweight and minimalist python module that takes a .txt file, and provides some context for any given token within the input corpus.
GNU General Public License v3.0
0 stars 0 forks source link
nlp tokenization

texTract

Based on an old university project from 2008 and initially designed in Perl, TexTract is a lightweight and minimalist python module that takes a .txt file, and provides some context for any given token within the input corpus.

It was originally meant to be used from the command line to obtain basic insights from various novels in .txt format.

textract

TexTract provides two main functionalities:

  1. Get context:

Outputs the 5 previous and and 5 following contextual tokens for every iteration of the input token.

  1. Get Summary

Outputs some very basic statistics for the input token, as well as an array of other noteworthy tokens to explore.

How to use texTract

  1. Open your terminal
  2. Place the textract.py inside a folder
  3. Place any .txt file inside the same folder
  4. Run the textract.py file

Example

TBC