x-tabdeveloping / turftopic

Robust and fast topic models with sentence-transformers.
https://x-tabdeveloping.github.io/turftopic/
MIT License
8 stars 3 forks source link

Implement Original c-TF-IDF #7

Closed x-tabdeveloping closed 4 months ago

x-tabdeveloping commented 4 months ago

Rationale

Currently we only have Soft c-TF-IDF implemented in the package, which is a "generalization" of c-TD-IDF, but unfortunately not identical. This is not a huge issue, since as far as I understand the values are monotonic with each other, meaning that this does not at all influence topic descriptions.

The reason it would be nice to have it is to be able to replicate BERTopic's behaviour exactly in the package.

Implementation

ClusteringTopicModel should have the following feature importance values as options: soft-c-tf-idf, c-tf-idf, centroid. We should merge the soft_ctf_idf.py and centroid_distance.py files into one post_hoc_importance.py module or feature_importance or whatever, where we have all three methods for post-hoc importance estimation.

x-tabdeveloping commented 4 months ago

Addressed by #8