If I understand correctly, the kCenterGreedy.selectbatch function is merely doing random sampling now. The main reason is that self.already_selected is not updated within the range(N) loop, therefore line 'ind = np.random.choice(np.arange(self.n_obs))' is always executed. To fix it, put self.already_selected = new_batch into the range(N) loop.
Please point it out if I misunderstood anything!
for _ in range(N):
if not self.already_selected:
Initialize centers with a randomly selected datapoint
ind = np.random.choice(np.arange(self.n_obs))
else:
ind = np.argmax(self.min_distances)
# New examples should not be in already selected since those points
# should have min_distance of zero to a cluster center.
assert ind not in already_selected
self.update_distances([ind], only_new=True, reset_dist=False)
new_batch.append(ind)
print('Maximum distance from cluster centers is %0.2f' % max(self.min_distances))
self.already_selected = new_batch#already_selected
If I understand correctly, the kCenterGreedy.selectbatch function is merely doing random sampling now. The main reason is that self.already_selected is not updated within the range(N) loop, therefore line 'ind = np.random.choice(np.arange(self.n_obs))' is always executed. To fix it, put self.already_selected = new_batch into the range(N) loop.
Please point it out if I misunderstood anything!
for _ in range(N): if not self.already_selected:
Initialize centers with a randomly selected datapoint