Closed dk-teknologisk-mlnn closed 2 years ago
Hi @sequoiagrove, PADIM algorithm doesn't require any CNN-based learning. It rather uses the CNN to extract the features of the training set, which is then used to fit a multivariate gaussian model. We therefore use 1 epoch to go through the entire training set and extract the features.
For the warning that there is no optimiser is also related to the above statement. Since we use the CNN only to extract the features, there is no optimiser set for the CNN training.
that's what I thought. to make the infrence work I had to copy paste some code snippets from your differnt branches to get a displayable heatmap that makes sense. One issue is that there is not "stats" and thresholds in meta_data, only the image size, so I changed it to return the anomaly map unaltered : output ,score= inference.predict(image=args.image_path, superimpose=False) and the inference.py takes anomaly_map, image_score . then I normalize it myself i = (i-min) / (max-min) . Looking good. I tried training my own example with only good samples. It highlighted most my flaws, except some very small subtle changes. is that expected outcome and will it be better at finding small anomalies if I provide annotated anomaly images in training? or do I need to choose one of the other models for such challenges?
Nevertheless, Impressive work :)
One issue is that there is not "stats" and thresholds in meta_data, only the image size, so I changed it to return the anomaly map unaltered : output ,score= inference.predict(image=args.image_path, superimpose=False) and the inference.py takes anomaly_map, image_score . then I normalize it myself i = (i-min) / (max-min) . Looking good. I tried training my own example with only good samples. It highlighted most my flaws, except some very small subtle changes.
This is a PR we just merged this morning, and haven't thoroughly tested yet. Maybe @ashwinvaidya17 could provide a better insight here.
is that expected outcome and will it be better at finding small anomalies if I provide annotated anomaly images in training? or do I need to choose one of the other models for such challenges?
The models don't use annotated images, so adding them wouldn't help. To find the small anomalies, you could either increase the image size or configure tiling from the config file. This is mainly because when a large image is resized into 256x256, detecting small anomalies becomes even smaller and detecting them becomes almost impossible. Using larger input or tiled input image could provide better performance.
In addition, our hyper-parameter optimisation tool will soon become publicly available so that parameter tuning could also be done to find the best parametrisation for custom datasets.
ah ok , I checked out this friday. Now I re-checked out and now I get good maps out of the box, as long I reverted lightning back to 1.3.6 , put number of workers down to a reasonable amount, and add cv2.waitKey() after imshow.
Yeah, there is a PR that bumps up the lightning version to 1.6.0dev, but there are some breaking changes, and might take some time to merge this.
Good catch for cv2.waitKey()
, we'll add it asap
Also found that line 149 in torch.py (inferencer) has to be 👍 _anomaly_map = anomaly_map.detach().numpy()_ in order to run stfpm models.
and patchcore cannot train at the moment due to datatypes:
68.9 M Trainable params
0 Non-trainable params
68.9 M Total params
275.533 Total estimated model params size (MB)
Epoch 0: 6%|███████████▌ | 8/132 [00:26<06:56, 3.36s/it]Traceback (most recent call last):
File "tools\train.py", line 66, in
Thanks for reporting these!
@sequoiagrove Thanks for reporting these 😀
The inference.py
does not superimpose anomaly maps. It would be good to add an option for this and make it a part of this issue.
I'll try to reproduce the patchcore
issue but it seems to be working in the tests. I'll have a look.
could it be confudsion between torch install that happened when I struggled to enable my GPU? form conda list:
pytorch 1.10.1 py3.8_cuda11.3_cudnn8_0 pytorch pytorch-lightning 1.3.6 pypi_0 pypi torch 1.8.1 pypi_0 pypi torch-metrics 1.1.7 pypi_0 pypi torchaudio 0.10.1 py38_cu113 pytorch torchmetrics 0.6.2 pypi_0 pypi torchvision 0.9.1 pypi_0 pypi
@sequoiagrove Could be. That's another issue that's been on our list for some time 🙃
fixed it: in patchcore/utils/random projections.py line 79-83: c_idx = torch.tensor( sample_without_replacement( n_population=n_features, n_samples=nnz_idx, random_state=self.randomstate )**,dtype=torch.long_** )
patchcore inference: File "d:\projects\anomalib\anomalib\utils\normalization\min_max.py", line 31, in normalize normalized = ((targets - threshold) / (max_val - min_val)) + 0.5 TypeError: unsupported operand type(s) for -: 'numpy.ndarray' and 'Tensor'
mixed datatypes.
Brilliant usage of union btw :) I guess it is the end of patchcore training it should be casting the types consistently?
meta_data is:
{'image_threshold': tensor(2.0865), 'pixel_threshold': tensor(2.8785), 'min': tensor(0.7478), 'max': tensor(4.2167), 'image_shape': (1024, 1024)}
anomalymap is tensor and pred_score is array.
if I run padim, both of them are tensors.
Found the issue. I prnted the data types trough out the inference. in model.py score and map is tensor all the way, it is in the deploy/torch.py you ask "isinstance(map, tensor)" . andboth of them are, but it is false because it is two tensors, not just one. and the code for false is to convert pred_score to numpy. but anomaly_map is assumed to be numpy already. the metadata is still tensors so I cant just also convert the map to numpy. if I just don't convert the score it works, but I guess that breaks some of the other models. so we need to handle the special case of getting two tensors.
this works for all three models I trained padim, patchcore and stfpm:
if isinstance(predictions, Tensor):
anomaly_map = predictions
pred_score = anomaly_map.reshape(-1).max()
else:
if isinstance(predictions[1],( Tensor)):
anomaly_map, pred_score = predictions
pred_score = pred_score.detach()
else:
anomaly_map, pred_score = predictions
pred_score = pred_score.detach().numpy()
I tried to make a new environment to install the exact versions in your requirements. I had to make the same fixes as above to get patchcore working. In mvtec examples it works well on carpet, wood and leather. but on for example screw it is nowhere near the performance reported. Is the mvtec benchmark for all the mvtec data categories trained with different hyperparameters? So far the best model on my own datasets is padim.
DATALOADER:0 TEST RESULTS {'image_AUROC': 0.44906747341156006, 'image_F1': 0.8561151623725891, 'pixel_AUROC': 0.9092798233032227, 'pixel_F1': 0.03107343800365925}
@sequoiagrove It is possible that some metrics might have diverged from when we collected the results. There is a plan to re-evaluate all the algorithms. Also, a benchmarking script is in PR state which will help gather results but merging this is pushed back before a refactor we are planning. Here is a tip, if you want to log anomaly images, you can modify log_images
to log_images_to: [local]
. It will save the results in the results
folder after training completion.
diverged metrics: dropping from 0.99 to 0.44 and 0.03 is rather critical? log images: nice :) Also padim dropped in performance, but not as crazy. Here's a patchcore result of "good" parts:
this is padim: DATALOADER:0 TEST RESULTS {'image_AUROC': 0.7589669823646545, 'image_F1': 0.8787878751754761, 'pixel_AUROC': 0.9781586527824402, 'pixel_F1': 0.22379672527313232}
I tried this other patchcore repo [ https://github.com/hcw-00/PatchCore_anomaly_detection ]on mvtec/screws and it gives me:
{'img_auc': 0.5911047345767575, 'pixel_auc': 0.9048583939897462}
and rather random anomaly maps as well..
Thanks for reporting this discrepancy @sequoiagrove. We'll investigate the benchmarks
Is it meant to be only one epoch in training? Your config files state 1 epoch, is that just as quick example? I tried to train PADIM for 10 on mvtec leather and wood and the metrics stay the same anyway, so it seems nothing as gained by training more. Lightning module also warn that there is no optimizer so I guess train only finds the correct thresholds and that takes 1 epoch.