lmcinnes / enstop

Ensemble topic modelling with pLSA
BSD 2-Clause "Simplified" License
112 stars 12 forks source link

HDBSCAN error stopping EnsembleTopics #12

Open Andy7475 opened 1 year ago

Andy7475 commented 1 year ago

The code from your homepage

from sklearn.datasets import fetch_20newsgroups from sklearn.feature_extraction.text import CountVectorizer from enstop import EnsembleTopics

news = fetch_20newsgroups(subset='all') data = CountVectorizer().fit_transform(news.data)

model = EnsembleTopics(ncomponents=20).fit(data) topics = model.components docvectors = model.embedding

results in an error: File hdbscan\_hdbscan_tree.pyx:659, in hdbscan._hdbscan_tree.get_clusters()

File hdbscan\_hdbscan_tree.pyx:733, in hdbscan._hdbscan_tree.get_clusters()

TypeError: 'numpy.float64' object cannot be interpreted as an integer

I have sklearn 1.3.0, Python 3.11.4