hcw-00 / PatchCore_anomaly_detection

Unofficial implementation of PatchCore anomaly detection
Apache License 2.0
317 stars 95 forks source link

Thanks for your work. But it's a bit strange here in the coreset sampling part #25

Open mengxianghan123 opened 2 years ago

mengxianghan123 commented 2 years ago

https://github.com/hcw-00/PatchCore_anomaly_detection/blob/59b67d9f8dd7a7c9e75203ca57e01cedbb1807ba/sampling_methods/kcenter_greedy.py#L104-L120 I think the wanted logic here should be initialing a centre point by random choice (the first if-branch) and all the entire centre points are chosen by argmax the distance (the second if-branch). But for the "self.already_selected" is initialized as an empty list in line 49(not None), so we'll never get into the first if-branch. As for the centre point initializing process, the expression "np.argmax(self.min_distances)" will return 0 for "np.argmax(None)" will return 0. So, as a result, the program selects index-0-feature as algorithm initialization everytime instead of random selecting one as a common practice.

JefferyChiang commented 2 years ago

I agree with your point. I think the first if-branch should be modify into "if not self.already_selected: " for more accurate to the kcenter_greedy algorithm. Have you modify the code and test the difference ?

mengxianghan123 commented 2 years ago

Yeah I modifyed the code like the following:

for _ in range(N):
            if not self.already_selected:
                # Initialize centers with a randomly selected datapoint
                ind = np.random.choice(np.arange(self.n_obs))
            else:
                ind = np.argmax(self.min_distances)

            assert ind not in self.already_selected

            self.update_distances([ind], only_new=True, reset_dist=False)
            new_batch.append(ind)
            print("Maximum distance from cluster centers is %0.2f" % max(self.min_distances))

            self.already_selected.append(ind)

I didn't test the difference with thorough experiments. I conjectured that there will not be significant difference between the two implementation but our modification is more up to standard and more consistent with the intention of the algorithm

HoseinHashemi commented 2 years ago

This point makes total sense. I also modified the code but haven't thoroughly tested.