jinhangjiang / morethansentiments

Python package to calculate Boilerplate and many other text quantified features
BSD 3-Clause "New" or "Revised" License
21 stars 3 forks source link

License PyPI Code Ocean Downloads

MoreThanSentiments

Besides sentiment scores, this Python package offers various ways of quantifying text corpus based on multiple works of literature. Currently, we support the calculation of the following measures:

A medium blog is here: MoreThanSentiments: A Python Library for Text Quantification

Citation

If this package was helpful in your work, feel free to cite it as

Installation

The easiest way to install the toolbox is via pip (pip3 in some distributions):

pip install MoreThanSentiments

Usage

Import the Package

import MoreThanSentiments as mts

Read data from txt files

my_dir_path = "D:/YourDataFolder"
df = mts.read_txt_files(PATH = my_dir_path)

Sentence Token

df['sent_tok'] = df.text.apply(mts.sent_tok)

Clean Data

If you want to clean on the sentence level:

df['cleaned_data'] = pd.Series()    
for i in range(len(df['sent_tok'])):
    df['cleaned_data'][i] = [mts.clean_data(x,\
                                            lower = True,\
                                            punctuations = True,\
                                            number = False,\
                                            unicode = True,\
                                            stop_words = False) for x in df['sent_tok'][i]] 

If you want to clean on the document level:

df['cleaned_data'] = df.text.apply(mts.clean_data, args=(True, True, False, True, False))

For the data cleaning function, we offer the following options:

Boilerplate

df['Boilerplate'] = mts.Boilerplate(sent_tok, n = 4, min_doc = 5, get_ngram = False)

Parameters:

Redundancy

df['Redundancy'] = mts.Redundancy(df.cleaned_data, n = 10)

Parameters:

Specificity

df['Specificity'] = mts.Specificity(df.text)

Parameters:

Relative_prevalence

df['Relative_prevalence'] = mts.Relative_prevalence(df.text)

Parameters:

For the full code script, you may check here: