tl-its-umich-edu / annoto-gai

This is Github Project to Annoto GAI work
0 stars 2 forks source link

Topic Extraction Error with K-Means #43

Closed takposha closed 3 months ago

takposha commented 3 months ago

When processing specific transcripts, there appears to be an error caused when clustering fails via HDBSCAN and falls back to the KMeans model. Here, Kmeans will fail to fit the data.

The exact cause of the error is still unknown but there is some issue in how the data is being vectorized. A fix for this involves simply having no vectorizer model be used, but this does not address the exact cause of the error, and merely sidestepping it.