Too many training images, memory overflow

x12901 commented 3 years ago

Hi, great project! I have 8000 images, and I found that the memory increased a lot during training. My computer has 60G RAM but it is still not enough.

model = SPADE(k=42)  # , backbone_name="hypernet")
train_dataset = StreamingDataset()
app_custom_train_images = "D:\\train\\good"
# train images
for root, dirs, files in os.walk(app_custom_train_images):
    for file in files:
        train_dataset.add_pil_image(Image.open(os.path.join(root, file)))
model.fit(train_dataset)
PATH = "test.pth"
torch.save(model.state_dict(), PATH)

pil_img = plt.open("20210115_raw.png")
img_pil_1 = np.array(pil_img) 
tensor_pil = torch.from_numpy(np.transpose(img_pil_1, (2, 0, 1)))
img = tensor_pil.type(torch.float32)  
img = img.unsqueeze(0) 
pil_to_tensor = img 
img_lvl_anom_score, pxl_lvl_anom_score = model.predict(pil_to_tensor)

The test picture does not seem to need to be transformed. Does it support pictures in other formats?

rvorias commented 3 years ago

Hi x1201, thanks for your interest!

Off the bat, 8000 images is a lot and way more than I've benchmarked. Here is a short overview of where the bulk of the memory goes for each method:

GENERAL

for each backbone, the more feature maps it returns, the more heavy the computations will be.

SPADE

self.zlib <-- stack of feature vectors, float32 self.fmaps <-- list of stacks of feature maps, float32. This will become the bulk of the memory.

solution: use something like mmap (Memory-mapped file support) to build it outside RAM. faiss could also be used. You will likely sacrifice some inference speed.

PADIM

self.patch_lib <-- stack of 2D patches, float32. This will become the main bulk of the memory. torch.linalg.inv(self.E) <--- this could cause memory issues if your 2D grid is large

solution: online calculation of mean and covar matrix when the samples are added to the training set

PatchCore

self.patch_lib <-- collection of patches, float32. This will become the main bulk of the memory. coreset selection <-- can also eat quite some memory as you will need to calculate distances between vectors

solution: the authors of the paper use faiss in their implementation, likely because it solves a couple of memory issues as well.

In a nutshell:

implement one of these solutions (and make a PR :) ) or
select a smaller backbone (resnet18, efficientnet_b0) or
reduce your dataset

Second, I see you are using Streamingdataset. For this to work you'd need to make a train instance and a test instance:

train_dataset = StreamingDataset()
test_dataset = StreamingDataset()

Then you can add samples like this (they are automatically transformed correctly):

for path in train_paths:
  train_dataset.add_pil_image(
      Image.open(path )
  )
for path in test_paths:
  test_dataset.add_pil_image(
      Image.open(path )
  )

For inference on test images, you then call:

test_idx = 0
sample, *_ = test_dataset[test_idx ]
img_lvl_anom_score, pxl_lvl_anom_score = model.predict(sample.unsqueeze(0))

Let me know if it works out!

x12901 commented 3 years ago

It worked, thanks

Dario-Mantegazza commented 1 month ago

Hi, i'm working on a project with your padim implementation. I reached the same issue of exploding memory usage. I've reviewed the code and something is not clear to me. Why do you reduce the embedding size only after stacking them and not before stacking it? basically, why build a, for example, stack of 1700 features maps and then rand_sample instead of sampling the maps and then stacking? The only thing I see you do (still studying why) is that you keep track of the indexes sampled. Also why you compute the means over the whole patch_lib and not on the reduced version? I don' understand the reason to.

In any case, thanks for the great repo

rvorias commented 1 month ago

Hi, i'm working on a project with your padim implementation. I reached the same issue of exploding memory usage. I've reviewed the code and something is not clear to me. Why do you reduce the embedding size only after stacking them and not before stacking it? basically, why build a, for example, stack of 1700 features maps and then rand_sample instead of sampling the maps and then stacking?

Doing rand sample before stacking is actually a good idea.

The only thing I see you do (still studying why) is that you keep track of the indexes sampled.

You have to keep it around for when you do inference on a new sample and want to compare it.

Also why you compute the means over the whole patch_lib and not on the reduced version? I don' understand the reason to.

Valid point. Will update this.

Dario-Mantegazza commented 1 month ago

Hi again, I've actually did some implementation that in my case lead to a ram peak usage that is just 1/4 of the previous one. Unfortunately I'm having a bug that I don't know if depends on my data or on my code modification. I will keep you posted and I will share the code. Have a nice weekend

rvorias commented 1 month ago

Hi again, I've actually did some implementation that in my case lead to a ram peak usage that is just 1/4 of the previous one. Unfortunately I'm having a bug that I don't know if depends on my data or on my code modification. I will keep you posted and I will share the code. Have a nice weekend

I just added your suggestions, you should pull the latest commit and see if it improves your setup.

Dario-Mantegazza commented 1 month ago

Hi, I've seen the changes you made, and they are 90% the same as mine, 🤣. The only difference is that I try to handle the 'self.r_indices' variable that is used later on. In your current code if 'self.d_reduced' is smaller than the embedding dimension it is left as 'None' (a very rare but possible case). This can cause an error in the forward function during inference. Here is my version

OT: I still have to properly test this as my current setup leads to other problems that I think are caused by my dataset leading to NaNs in the matrixes. I will keep you posted if I find something.

rvorias / ind_knn_ad