Short-text-analyzer

This ShortTextAnalyzer was created to help analyze the open-ended survey response which usually has less than three sentences. The analysis includes topic modeling, sentiment analysis, and visualization. This topic modeling was done using pre-trained representations of language, namely BERT, combine with the clustering algorithm.

Documentation Page: https://thisisphume.github.io/short-text-analyzer/

Install

pip install short-text-analyzer

Install all the required packages from the requirement.txt file.

pip install -r requirements.txt

from shorttextanalyzer.core import *

How to use

analyzer = shortTextAnalyzer(comments_series, 4)
output_result = analyzer.analyze_getResult()

Embedding Method for Visualization is  2AE  with MSE of 0.6560611658549391
Embedding Method for Clustering is  2AE  with MSE of 0.4782262679093038
Number of clusters via HDBSCAN is:  5.0
Number of clusters via KMeans is:   4

Here we specify that we want 4 clusters/topic from this data.

Output: result

sentimentScore: Polarity score ranges from [-1,1] where 1 means positive statement and -1 means a negative statement.
Subjective: score ranges from [0,1] where 1 refer to personal opinion, emotion or judgment and 0 means it is factual information.
clusterByKMeans: assigned cluster number for each comments using KMeans
clusterByHDBSCAN: assigned cluster number for each comments using HDBSCAN

output_result.sample(2)

	comments	comment_lang	comments_clean	sentimentScore	subjectiveScore	clusterByKMeans	clusterByHDBSCAN
50	sondage parfait	fr	perfect poll	1.00	1.000000	2	1
875	it wasn't very clear what the purpose of the f...	en	it wasn't very clear what the purpose of the f...	0.19	0.415833	1	1

Visualization: how good is our clusters? HDBSCAN and KMeans

analyzer.plot_output()

png

thisisphume / short_text_analyzer

readme

Short-text-analyzer

Install

How to use

Output: result

Visualization: how good is our clusters? HDBSCAN and KMeans

Reference