x-tabdeveloping / turftopic

Robust and fast topic models with sentence-transformers.
https://x-tabdeveloping.github.io/turftopic/
MIT License
17 stars 4 forks source link

Added optional dimensionality reduction to GMM #9

Closed x-tabdeveloping closed 7 months ago

x-tabdeveloping commented 7 months ago

Rationale:

You can now use dimensionality reduction with GMM. This is useful as GMM is very fast, even for large datasets, but doesn't handle high dimensionality very well due to a vast parameter space.

Usage:

You can pass any TransformerMixin to GMM as the dimensionality_reduction parameter.

from turftopic import GMM
from sklearn.decomposition import PCA

gmm = GMM(10, dimensionality_reduction=PCA(20))

Performance:

I did some experiments on my machine, and with PCA(20) and 20 topics I get virtually the same results as with the full model on 20 Newsgroups, but the model runs in under half a minute instead of three minutes, which I think is impressive.