Besides sentiment scores, this Python package offers various ways of quantifying text corpus based on multiple works of literature. Currently, we support the calculation of the following measures:
A medium blog is here: MoreThanSentiments: A Python Library for Text Quantification
If this package was helpful in your work, feel free to cite it as
The easiest way to install the toolbox is via pip (pip3 in some distributions):
pip install MoreThanSentiments
import MoreThanSentiments as mts
my_dir_path = "D:/YourDataFolder"
df = mts.read_txt_files(PATH = my_dir_path)
df['sent_tok'] = df.text.apply(mts.sent_tok)
If you want to clean on the sentence level:
df['cleaned_data'] = pd.Series()
for i in range(len(df['sent_tok'])):
df['cleaned_data'][i] = [mts.clean_data(x,\
lower = True,\
punctuations = True,\
number = False,\
unicode = True,\
stop_words = False) for x in df['sent_tok'][i]]
If you want to clean on the document level:
df['cleaned_data'] = df.text.apply(mts.clean_data, args=(True, True, False, True, False))
For the data cleaning function, we offer the following options:
df['Boilerplate'] = mts.Boilerplate(sent_tok, n = 4, min_doc = 5, get_ngram = False)
Parameters:
df['Redundancy'] = mts.Redundancy(df.cleaned_data, n = 10)
Parameters:
df['Specificity'] = mts.Specificity(df.text)
Parameters:
df['Relative_prevalence'] = mts.Relative_prevalence(df.text)
Parameters:
For the full code script, you may check here: