thisisphume / short_text_analyzer

The Short-Text Analyzer is created to help analyze the open-ended survey response which usually has less than three sentences. The analysis includes topic modeling, sentiment analysis, and visualization. This topic modeling was done using pre-trained representations of language, namely BERT, combine with the clustering algorithm.
Apache License 2.0
3 stars 2 forks source link

Short-text-analyzer

This ShortTextAnalyzer was created to help analyze the open-ended survey response which usually has less than three sentences. The analysis includes topic modeling, sentiment analysis, and visualization. This topic modeling was done using pre-trained representations of language, namely BERT, combine with the clustering algorithm.

Documentation Page: https://thisisphume.github.io/short-text-analyzer/

Install

pip install short-text-analyzer

Install all the required packages from the requirement.txt file.

pip install -r requirements.txt

from shorttextanalyzer.core import *

How to use

analyzer = shortTextAnalyzer(comments_series, 4)
output_result = analyzer.analyze_getResult()
Embedding Method for Visualization is  2AE  with MSE of 0.6560611658549391
Embedding Method for Clustering is  2AE  with MSE of 0.4782262679093038
Number of clusters via HDBSCAN is:  5.0
Number of clusters via KMeans is:   4

Here we specify that we want 4 clusters/topic from this data.

Output: result

output_result.sample(2)
comments comment_lang comments_clean sentimentScore subjectiveScore clusterByKMeans clusterByHDBSCAN
50 sondage parfait fr perfect poll 1.00 1.000000 2 1
875 it wasn't very clear what the purpose of the f... en it wasn't very clear what the purpose of the f... 0.19 0.415833 1 1

Visualization: how good is our clusters? HDBSCAN and KMeans

analyzer.plot_output()

png

png

Reference