Hi, thank you for making and sharing this repo! This is exactly what I was looking for in my research. I was wondering if might be possible to do constrained Kmeans across batches using this library? I would like to restrict the max points per cluster even when using batches. My dataset is very large and I cannot load the entire thing onto the GPU. However, when I do so, the max points per cluster restriction does not seem to hold anymore. Here is some code to illustrate:
Full batch converged at iteration 5/100 with center shifts:
tensor([0.]).
tensor([5, 5, 5, 5])
Full batch converged at iteration 3/100 with center shifts:
tensor([0., 0., 0., 0.]).
tensor([5, 6, 5, 4])
Basically, I am trying to get the counts to still be [5, 5, 5, 5] even when performing batched computation. Thank you!
Hi, thank you for making and sharing this repo! This is exactly what I was looking for in my research. I was wondering if might be possible to do constrained Kmeans across batches using this library? I would like to restrict the max points per cluster even when using batches. My dataset is very large and I cannot load the entire thing onto the GPU. However, when I do so, the max points per cluster restriction does not seem to hold anymore. Here is some code to illustrate:
This prints:
Basically, I am trying to get the
counts
to still be[5, 5, 5, 5]
even when performing batched computation. Thank you!