hcw-00 / PatchCore_anomaly_detection

Unofficial implementation of PatchCore anomaly detection
Apache License 2.0
317 stars 95 forks source link

wondering if the coreset sampling is only random sampling now #46

Open YuchenKid opened 2 years ago

YuchenKid commented 2 years ago

If I understand correctly, the kCenterGreedy.selectbatch function is merely doing random sampling now. The main reason is that self.already_selected is not updated within the range(N) loop, therefore line 'ind = np.random.choice(np.arange(self.n_obs))' is always executed. To fix it, put self.already_selected = new_batch into the range(N) loop.

Please point it out if I misunderstood anything!

for _ in range(N): if not self.already_selected:

Initialize centers with a randomly selected datapoint

    ind = np.random.choice(np.arange(self.n_obs))
  else:
    ind = np.argmax(self.min_distances)
  # New examples should not be in already selected since those points
  # should have min_distance of zero to a cluster center.
  assert ind not in already_selected

  self.update_distances([ind], only_new=True, reset_dist=False)
  new_batch.append(ind)
  print('Maximum distance from cluster centers is %0.2f' % max(self.min_distances))

  self.already_selected = new_batch#already_selected