rvorias / ind_knn_ad

Vanilla torch and timm industrial knn-based anomaly detection for images.
https://share.streamlit.io/rvorias/ind_knn_ad
MIT License
147 stars 50 forks source link

Is it possible to release memory after train and testing for each class? #2

Closed leolin65 closed 1 year ago

leolin65 commented 3 years ago

(env1) D:\Code\ind_knn_ad-master\indad>python run.py padim DATASETS_PATH in 'D:\dataset\mvtec_anomaly_detection/' Running padim on bottle dataset. Training ... 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 209/209 [00:33<00:00, 6.21it/s] Testing ... 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 83/83 [00:15<00:00, 5.20it/s] Test results bottle - image_rocauc: 1.00, pixel_rocauc: 0.96 Running padim on cable dataset. Training ... 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 224/224 [00:28<00:00, 7.85it/s] Testing ... 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 150/150 [00:31<00:00, 4.72it/s] Test results cable - image_rocauc: 0.91, pixel_rocauc: 0.96 Running padim on capsule dataset. Training ... 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 219/219 [00:29<00:00, 7.52it/s] Testing ... 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 132/132 [00:26<00:00, 4.91it/s] Test results capsule - image_rocauc: 0.86, pixel_rocauc: 0.98 Running padim on carpet dataset. Training ... 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 280/280 [00:41<00:00, 6.71it/s] Traceback (most recent call last): File "D:\Code\ind_knn_ad-master\indad\run.py", line 82, in cli_interface() File "D:\ProgramData\Anaconda3\envs\env1\lib\site-packages\click\core.py", line 1137, in call return self.main(args, kwargs) File "D:\ProgramData\Anaconda3\envs\env1\lib\site-packages\click\core.py", line 1062, in main rv = self.invoke(ctx) File "D:\ProgramData\Anaconda3\envs\env1\lib\site-packages\click\core.py", line 1404, in invoke return ctx.invoke(self.callback, ctx.params) File "D:\ProgramData\Anaconda3\envs\env1\lib\site-packages\click\core.py", line 763, in invoke return __callback(args, **kwargs) File "D:\Code\ind_knn_ad-master\indad\run.py", line 77, in cli_interface total_results = run_model(method, dataset) File "D:\Code\ind_knn_ad-master\indad\run.py", line 44, in run_model model.fit(train_ds) File "D:\Code\ind_knn_ad-master\indad\model.py", line 171, in fit self.patch_lib = torch.cat(self.patch_lib, 0) RuntimeError: [enforce fail at ..\c10\core\CPUAllocator.cpp:79] data. DefaultCPUAllocator: not enough memory: you tried to allocate 6294077440 bytes.

rvorias commented 3 years ago

Hi,

run.py overwrites the old model for each dataset. So to answer your main question: yes, it is possible and in fact it is already implemented.

However, the error code you posted is actually related to a different problem: The torch.cat operation takes up a large chunk of RAM. Up until now, the capsule dataset has the most training samples (280) and it does not seem to fit in your computer's RAM. Could you tell how much RAM you have installed?

There are a couple of options to go from here:

leolin65 commented 3 years ago

Hi,

run.py overwrites the old model for each dataset. So to answer your main question: yes, it is possible and in fact it is already implemented.

However, the error code you posted is actually related to a different problem: The torch.cat operation takes up a large chunk of RAM. Up until now, the capsule dataset has the most training samples (280) and it does not seem to fit in your computer's RAM. Could you tell how much RAM you have installed?

There are a couple of options to go from here:

  • create an empty tensor of size [n_train, ...] and fill it with each incoming patch.
  • you could place the random dimension selection before the torch.cat operation.
  • calculate means and covariances online.

My ram size 32G. I will try again. thank you..

rvorias commented 3 years ago

My pc has the same RAM amount. Can you execute run.py padim --dataset carpet, just to make sure there are no memory issues coming from existing models?

What backbone are you using? If you are using a very wide network, it might be that the amount of channels is very large.