This ShortTextAnalyzer was created to help analyze the open-ended survey response which usually has less than three sentences. The analysis includes topic modeling, sentiment analysis, and visualization. This topic modeling was done using pre-trained representations of language, namely BERT, combine with the clustering algorithm.
Documentation Page: https://thisisphume.github.io/short-text-analyzer/
pip install short-text-analyzer
Install all the required packages from the requirement.txt file.
pip install -r requirements.txt
from shorttextanalyzer.core import *
analyzer = shortTextAnalyzer(comments_series, 4)
output_result = analyzer.analyze_getResult()
Embedding Method for Visualization is 2AE with MSE of 0.6560611658549391
Embedding Method for Clustering is 2AE with MSE of 0.4782262679093038
Number of clusters via HDBSCAN is: 5.0
Number of clusters via KMeans is: 4
Here we specify that we want 4 clusters/topic from this data.
sentimentScore
: Polarity score ranges from [-1,1] where 1 means positive statement and -1 means a negative statement. Subjective
: score ranges from [0,1] where 1 refer to personal opinion, emotion or judgment and 0 means it is factual information. clusterByKMeans
: assigned cluster number for each comments using KMeansclusterByHDBSCAN
: assigned cluster number for each comments using HDBSCANoutput_result.sample(2)
comments | comment_lang | comments_clean | sentimentScore | subjectiveScore | clusterByKMeans | clusterByHDBSCAN | |
---|---|---|---|---|---|---|---|
50 | sondage parfait | fr | perfect poll | 1.00 | 1.000000 | 2 | 1 |
875 | it wasn't very clear what the purpose of the f... | en | it wasn't very clear what the purpose of the f... | 0.19 | 0.415833 | 1 | 1 |
analyzer.plot_output()