Topic Extraction Error with K-Means

When processing specific transcripts, there appears to be an error caused when clustering fails via HDBSCAN and falls back to the KMeans model. Here, Kmeans will fail to fit the data.

The exact cause of the error is still unknown but there is some issue in how the data is being vectorized. A fix for this involves simply having no vectorizer model be used, but this does not address the exact cause of the error, and merely sidestepping it.

tl-its-umich-edu / annoto-gai

Topic Extraction Error with K-Means #43