Memory Explosion - Githubissues

anan-dad commented 2 years ago

Hi I run the train.py under following env:

Ubuntu 16.04, docker env: pytorch 1.8.1 with GPU support
Machine Memory: 32G, swap area size: 12GB; GPU: GF RTX 2080 Ti 12 GB
Num of epochs:50, batch_size:1, load_size: 256, input_size:224, with other parameters default/unchanged
Trainning dataset consists of 150 non-defect images with 1300*2300 resolution
Test dataset consists of 100 defect and non-defect images, with ground truth images where a pure black for non-defect images and write musk for the defect areas for defect images

Problem: After epoch 11, the machine memory (not GPU memory) takes up to 40GB, and at epoch 15, the usage of machine memory would be over 50GB, then the system would kill the training process.

Any ideas why this happen, and how to fix?

ake020675 commented 2 years ago

I think i met the same problem. Once the memory exceed the max memory of my machine, an error happened somewhere like np.array(c, w, h)

paining commented 2 years ago

As I understood, PatchCore doesn't need more than 1 epochs. because data from 2nd ~ inf epochs, it is duplicated from first epochs' data. It is named as training, but actually it is samping....

TC4476 commented 2 years ago

I'm also facing the same issue. What is the potential solution?

hcw-00 / PatchCore_anomaly_detection

Memory Explosion #16