obsei / obsei

Obsei is a low code AI powered automation tool. It can be used in various business flows like social listening, AI based alerting, brand image analysis, comparative study and more .
https://obsei.com/
Apache License 2.0
1.22k stars 161 forks source link

[Analyzer] Unsupervised Clustering #130

Open shahrukhx01 opened 3 years ago

shahrukhx01 commented 3 years ago

@lalitpagaria for getting document vectors we can use this

https://github.com/UKPLab/sentence-transformers

shahrukhx01 commented 3 years ago

@lalitpagaria following are the steps involved in doing this:

  1. Take n number of text documents and extract sentence/document embeddings using sentence transformers.
  2. Apply unsupervised clustering algorithms, from Sklearn https://scikit-learn.org/stable/modules/clustering.html
  3. Show the actual raw texts in grouped form
  4. Alternatively apply dimensionality reductions and show a visualization like this and link each point of visualization to actual raw text/ maybe show on hover etc.

Hope this would help.

lalitpagaria commented 3 years ago

@shahrukhx01 Thank for the information. Let me read them out. For first version would it possible to build cluster on list of texts. For example if Obsei fetch 200 reviews, then using these 200 texts can we generate cluster. Then tag each and every reviews based on which cluster it belongs to. Also it is possible to get multiple categories?

shahrukhx01 commented 3 years ago

@lalitpagaria that's where topic modelling come into play, to assign categories based on the content of the documents. We have a separate issue for that #131

lalitpagaria commented 3 years ago

Yeah my bad. Then let's integrate Topic modelling first.

shahrukhx01 commented 3 years ago

@lalitpagaria could you create a dataset of 200 posts as a csv and host it on Kaggle, I’ll take it up in the first week up August if no ones takes up these two issues