Closed jasonbaldridge closed 11 years ago
As is my (bad) habit, the K-means(++) impl in breeze is generic on vector type, so can use SparseVectors.
-- David
On Tue, Apr 16, 2013 at 12:45 PM, Jason Baldridge notifications@github.comwrote:
The current k-means implementation is something I did for homework assignments for teaching NLP courses at UT Austin. It can handle a fair amount, but it runs out of steam (in particular, memory) for larger datasets, especially if they have a lot of features. It currently uses dense vectors to represent the features for each data point, so it should be a fairly straightforward win to change this to use sparse vectors instead.
— Reply to this email directly or view it on GitHubhttps://github.com/scalanlp/nak/issues/10 .
Awesome. This may be sorted out directly as we transition things from Breeze then.
The current k-means implementation is something I did for homework assignments for teaching NLP courses at UT Austin. It can handle a fair amount, but it runs out of steam (in particular, memory) for larger datasets, especially if they have a lot of features. It currently uses dense vectors to represent the features for each data point, so it should be a fairly straightforward win to change this to use sparse vectors instead.