Open tjdwill opened 7 months ago
Coming back around to this, I think the proposed changes are viable, but I caution against implementing this change just because I can. I propose developing this in a side branch, and only releasing the patch if it is requested (or if I need it myself).
Problem
Currently, the
ndim
parameter allows a user to specify how many dimensions to cluster off of. The present implementation results in taking the firstndim
elements of each data entry such that it is accessed asdata[:, :ndim]
.If users wanted to specify which columns to use however, they'd have to reorder the data themselves before calling the function.
Proposed Solution
Have
ndim
accept either anint
or a tuple ofints
. The former simply says "cluster off of the firstndim
dimensions," and the latter says "cluster using these column indices". Then, Numpy's tuple indexing could be leveraged:How would we handle the
ndim: int
case? Simple, generate a tuple of indices from the number:This solution is excellent in that code that currently passes an int to
ndim
can continue to do so with no breakage.Functions to change
Pretty much all of them, but most are minor changes (ex.
:ndim
->ndim
in indexes).kmeans.base_funcs
: Change accesses; modify_assign_clusters
to take inndim
as a parameter rather than generating it.kmeans.clustering
: Change_assign_clusters
call; adjust documentation.kmeans.animate
_draw
's derivation ofx
,y
, andz
.view_clustering
.kmeans_segmentation
- No changes.Additional Notes
ndim
matches the number of columns in providedinitial_means
.len(ndim) == len(set(ndim))
) assumingndim
is a tuple.