hcw-00 / PatchCore_anomaly_detection

Unofficial implementation of PatchCore anomaly detection
Apache License 2.0
317 stars 95 forks source link

How to apply SparseRandomProjector to large Image dataset? #40

Open PeterKim1 opened 2 years ago

PeterKim1 commented 2 years ago

Hello.

I want to apply this model to large image dataset. (I have over 10,000 images)

But RAM memory issue arise.

https://github.com/hcw-00/PatchCore_anomaly_detection/blob/main/sampling_methods/kcenter_greedy.py#L95

self.features = model.transform(self.X)

I think this code puts all the data embedding into RAM memory and apply SparseRandomProjector, which seems to put a lot of

pressure on RAM memory.(I'm just novice, so this may be wrong.)

Does anyone know how to solve this problem?

One idea i have is to split the data in half and apply the SparseRandomProjector to each of them, but I think it might cause problems

because SparseRandomProjector determines the dimensionality of embeddings based on Johnson-Lindenstrauss lemma.

According to sklearn document(https://scikit-learn.org/stable/modules/generated/sklearn.random_projection.SparseRandomProjection.html), n_components can be automatically adjusted according to the number of samples in the dataset.