x-tabdeveloping / turftopic

Robust and fast topic models with sentence-transformers.
https://x-tabdeveloping.github.io/turftopic/
MIT License
17 stars 4 forks source link

dimensionality reduction for parameter estimation in ClusteringTopicModel #19

Closed rbroc closed 6 months ago

rbroc commented 6 months ago

more of a question than an issue, but I noticed that, in ClusteringTopicModel, while clustering is performed on reduced vectors, topic centroids and feature importances are computed on the full vectors (pre- dim reduction). Is that the intended behavior?

x-tabdeveloping commented 6 months ago

I was thinking about this actually. I don't remember if I checked, but I will give it one more look in Top2Vec. My guess is though, that it doesn't change much, since UMAP and TSNE are based on nearest neighbours.

x-tabdeveloping commented 6 months ago

I just checked in Top2Vec, and to me it seems that they also find the relevant words in the original high-dimensional space. (though I would love a sanity check if you still have concerns :smile: )

rbroc commented 6 months ago

they do! -- i've opened an issue there bc i am curious why and whether they tried w/ reduced vectors, but totally fine leaving this as is here (thus closing this)