hcw-00 / PatchCore_anomaly_detection

Unofficial implementation of PatchCore anomaly detection
Apache License 2.0
317 stars 95 forks source link

Memory Explosion #16

Closed anan-dad closed 2 years ago

anan-dad commented 2 years ago

Hi I run the train.py under following env:

  1. Ubuntu 16.04, docker env: pytorch 1.8.1 with GPU support
  2. Machine Memory: 32G, swap area size: 12GB; GPU: GF RTX 2080 Ti 12 GB
  3. Num of epochs:50, batch_size:1, load_size: 256, input_size:224, with other parameters default/unchanged
  4. Trainning dataset consists of 150 non-defect images with 1300*2300 resolution
  5. Test dataset consists of 100 defect and non-defect images, with ground truth images where a pure black for non-defect images and write musk for the defect areas for defect images

Problem: After epoch 11, the machine memory (not GPU memory) takes up to 40GB, and at epoch 15, the usage of machine memory would be over 50GB, then the system would kill the training process.

Any ideas why this happen, and how to fix?

ake020675 commented 2 years ago

I think i met the same problem. Once the memory exceed the max memory of my machine, an error happened somewhere like np.array(c, w, h)

paining commented 2 years ago

As I understood, PatchCore doesn't need more than 1 epochs. because data from 2nd ~ inf epochs, it is duplicated from first epochs' data. It is named as training, but actually it is samping....

TC4476 commented 2 years ago

I'm also facing the same issue. What is the potential solution?