by default hdbscan finds an optimal number of topics using its algorithm. We should add a function allowing the user to apply further hierarchical clustering on the results whereby the "optimal K" number of topics will be clustered down to a user-defined K number. In addition to making top2vec comparable to standard topic modelling techniques like LDA, further clustering might be useful for working with topic models from large corpora where hdbscan might identify dozens or hundreds of topics
by default
hdbscan
finds an optimal number of topics using its algorithm. We should add a function allowing the user to apply further hierarchical clustering on the results whereby the "optimal K" number of topics will be clustered down to a user-defined K number. In addition to making top2vec comparable to standard topic modelling techniques like LDA, further clustering might be useful for working with topic models from large corpora where hdbscan might identify dozens or hundreds of topics