openvinotoolkit / anomalib

An anomaly detection library comprising state-of-the-art algorithms and features such as experiment management, hyper-parameter optimization, and edge inference.
https://anomalib.readthedocs.io/en/latest/
Apache License 2.0
3.82k stars 678 forks source link

1 epoch #67

Closed dk-teknologisk-mlnn closed 2 years ago

dk-teknologisk-mlnn commented 2 years ago

Is it meant to be only one epoch in training? Your config files state 1 epoch, is that just as quick example? I tried to train PADIM for 10 on mvtec leather and wood and the metrics stay the same anyway, so it seems nothing as gained by training more. Lightning module also warn that there is no optimizer so I guess train only finds the correct thresholds and that takes 1 epoch.

samet-akcay commented 2 years ago

Hi @sequoiagrove, PADIM algorithm doesn't require any CNN-based learning. It rather uses the CNN to extract the features of the training set, which is then used to fit a multivariate gaussian model. We therefore use 1 epoch to go through the entire training set and extract the features.

samet-akcay commented 2 years ago

For the warning that there is no optimiser is also related to the above statement. Since we use the CNN only to extract the features, there is no optimiser set for the CNN training.

dk-teknologisk-mlnn commented 2 years ago

that's what I thought. to make the infrence work I had to copy paste some code snippets from your differnt branches to get a displayable heatmap that makes sense. One issue is that there is not "stats" and thresholds in meta_data, only the image size, so I changed it to return the anomaly map unaltered : output ,score= inference.predict(image=args.image_path, superimpose=False) and the inference.py takes anomaly_map, image_score . then I normalize it myself i = (i-min) / (max-min) . Looking good. I tried training my own example with only good samples. It highlighted most my flaws, except some very small subtle changes. is that expected outcome and will it be better at finding small anomalies if I provide annotated anomaly images in training? or do I need to choose one of the other models for such challenges?

Nevertheless, Impressive work :)

samet-akcay commented 2 years ago

One issue is that there is not "stats" and thresholds in meta_data, only the image size, so I changed it to return the anomaly map unaltered : output ,score= inference.predict(image=args.image_path, superimpose=False) and the inference.py takes anomaly_map, image_score . then I normalize it myself i = (i-min) / (max-min) . Looking good. I tried training my own example with only good samples. It highlighted most my flaws, except some very small subtle changes.

This is a PR we just merged this morning, and haven't thoroughly tested yet. Maybe @ashwinvaidya17 could provide a better insight here.

is that expected outcome and will it be better at finding small anomalies if I provide annotated anomaly images in training? or do I need to choose one of the other models for such challenges?

The models don't use annotated images, so adding them wouldn't help. To find the small anomalies, you could either increase the image size or configure tiling from the config file. This is mainly because when a large image is resized into 256x256, detecting small anomalies becomes even smaller and detecting them becomes almost impossible. Using larger input or tiled input image could provide better performance.

In addition, our hyper-parameter optimisation tool will soon become publicly available so that parameter tuning could also be done to find the best parametrisation for custom datasets.

dk-teknologisk-mlnn commented 2 years ago

ah ok , I checked out this friday. Now I re-checked out and now I get good maps out of the box, as long I reverted lightning back to 1.3.6 , put number of workers down to a reasonable amount, and add cv2.waitKey() after imshow.

samet-akcay commented 2 years ago

Yeah, there is a PR that bumps up the lightning version to 1.6.0dev, but there are some breaking changes, and might take some time to merge this.

Good catch for cv2.waitKey(), we'll add it asap

dk-teknologisk-mlnn commented 2 years ago

Also found that line 149 in torch.py (inferencer) has to be 👍 _anomaly_map = anomaly_map.detach().numpy()_ in order to run stfpm models.

and patchcore cannot train at the moment due to datatypes:

| Name | Type | Params

0 | image_threshold | AdaptiveThreshold | 0 1 | pixel_threshold | AdaptiveThreshold | 0 2 | training_distribution | AnomalyScoreDistribution | 0 3 | min_max | MinMax | 0 4 | image_metrics | MetricCollection | 0 5 | pixel_metrics | MetricCollection | 0 6 | model | PatchcoreModel | 68.9 M

68.9 M Trainable params 0 Non-trainable params 68.9 M Total params 275.533 Total estimated model params size (MB) Epoch 0: 6%|███████████▌ | 8/132 [00:26<06:56, 3.36s/it]Traceback (most recent call last): File "tools\train.py", line 66, in train() File "tools\train.py", line 61, in train trainer.fit(model=model, datamodule=datamodule) File "C:\Anaconda3\envs\anomalib_env\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 458, in fit self._run(model) File "C:\Anaconda3\envs\anomalib_env\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 756, in _run self.dispatch() File "C:\Anaconda3\envs\anomalib_env\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 797, in dispatch self.accelerator.start_training(self) File "C:\Anaconda3\envs\anomalib_env\lib\site-packages\pytorch_lightning\accelerators\accelerator.py", line 96, in start_training self.training_type_plugin.start_training(trainer) File "C:\Anaconda3\envs\anomalib_env\lib\site-packages\pytorch_lightning\plugins\training_type\training_type_plugin.py", line 144, in start_training self._results = trainer.run_stage() File "C:\Anaconda3\envs\anomalib_env\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 807, in run_stage return self.run_train() File "C:\Anaconda3\envs\anomalib_env\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 869, in run_train self.train_loop.run_training_epoch() File "C:\Anaconda3\envs\anomalib_env\lib\site-packages\pytorch_lightning\trainer\training_loop.py", line 566, in run_training_epoch self.on_train_epoch_end(epoch_output) File "C:\Anaconda3\envs\anomalib_env\lib\site-packages\pytorch_lightning\trainer\training_loop.py", line 606, in on_train_epoch_end training_epoch_end_output = model.training_epoch_end(processed_epoch_output) File "d:\projects\anomalib\anomalib\models\patchcore\model.py", line 297, in training_epoch_end embedding = self.model.subsample_embedding(embedding, sampling_ratio) File "d:\projects\anomalib\anomalib\models\patchcore\model.py", line 229, in subsample_embedding random_projector.fit(embedding) File "d:\projects\anomalib\anomalib\models\patchcore\utils\sampling\random_projection.py", line 124, in fit self.sparse_random_matrix = self._sparse_random_matrix(n_features=n_features).to(device) File "d:\projects\anomalib\anomalib\models\patchcore\utils\sampling\random_projection.py", line 85, in _sparse_random_matrix components[i, c_idx] = data.double() IndexError: tensors used as indices must be long, byte or bool tensors

samet-akcay commented 2 years ago

Thanks for reporting these!

ashwinvaidya17 commented 2 years ago

@sequoiagrove Thanks for reporting these 😀 The inference.py does not superimpose anomaly maps. It would be good to add an option for this and make it a part of this issue. I'll try to reproduce the patchcore issue but it seems to be working in the tests. I'll have a look.

dk-teknologisk-mlnn commented 2 years ago

could it be confudsion between torch install that happened when I struggled to enable my GPU? form conda list:

pytorch 1.10.1 py3.8_cuda11.3_cudnn8_0 pytorch pytorch-lightning 1.3.6 pypi_0 pypi torch 1.8.1 pypi_0 pypi torch-metrics 1.1.7 pypi_0 pypi torchaudio 0.10.1 py38_cu113 pytorch torchmetrics 0.6.2 pypi_0 pypi torchvision 0.9.1 pypi_0 pypi

ashwinvaidya17 commented 2 years ago

@sequoiagrove Could be. That's another issue that's been on our list for some time 🙃

dk-teknologisk-mlnn commented 2 years ago

fixed it: in patchcore/utils/random projections.py line 79-83: c_idx = torch.tensor( sample_without_replacement( n_population=n_features, n_samples=nnz_idx, random_state=self.randomstate )**,dtype=torch.long_** )

dk-teknologisk-mlnn commented 2 years ago

patchcore inference: File "d:\projects\anomalib\anomalib\utils\normalization\min_max.py", line 31, in normalize normalized = ((targets - threshold) / (max_val - min_val)) + 0.5 TypeError: unsupported operand type(s) for -: 'numpy.ndarray' and 'Tensor'

mixed datatypes.

Brilliant usage of union btw :) I guess it is the end of patchcore training it should be casting the types consistently?

meta_data is:

{'image_threshold': tensor(2.0865), 'pixel_threshold': tensor(2.8785), 'min': tensor(0.7478), 'max': tensor(4.2167), 'image_shape': (1024, 1024)}

anomalymap is tensor and pred_score is array.

if I run padim, both of them are tensors.

dk-teknologisk-mlnn commented 2 years ago

Found the issue. I prnted the data types trough out the inference. in model.py score and map is tensor all the way, it is in the deploy/torch.py you ask "isinstance(map, tensor)" . andboth of them are, but it is false because it is two tensors, not just one. and the code for false is to convert pred_score to numpy. but anomaly_map is assumed to be numpy already. the metadata is still tensors so I cant just also convert the map to numpy. if I just don't convert the score it works, but I guess that breaks some of the other models. so we need to handle the special case of getting two tensors.

dk-teknologisk-mlnn commented 2 years ago

this works for all three models I trained padim, patchcore and stfpm:

    if isinstance(predictions, Tensor):
        anomaly_map = predictions
        pred_score = anomaly_map.reshape(-1).max()
    else:
        if isinstance(predictions[1],( Tensor)):
            anomaly_map, pred_score = predictions
            pred_score = pred_score.detach()
        else:               
            anomaly_map, pred_score = predictions
            pred_score = pred_score.detach().numpy()
dk-teknologisk-mlnn commented 2 years ago

I tried to make a new environment to install the exact versions in your requirements. I had to make the same fixes as above to get patchcore working. In mvtec examples it works well on carpet, wood and leather. but on for example screw it is nowhere near the performance reported. Is the mvtec benchmark for all the mvtec data categories trained with different hyperparameters? So far the best model on my own datasets is padim.

DATALOADER:0 TEST RESULTS {'image_AUROC': 0.44906747341156006, 'image_F1': 0.8561151623725891, 'pixel_AUROC': 0.9092798233032227, 'pixel_F1': 0.03107343800365925}

ashwinvaidya17 commented 2 years ago

@sequoiagrove It is possible that some metrics might have diverged from when we collected the results. There is a plan to re-evaluate all the algorithms. Also, a benchmarking script is in PR state which will help gather results but merging this is pushed back before a refactor we are planning. Here is a tip, if you want to log anomaly images, you can modify log_images to log_images_to: [local]. It will save the results in the results folder after training completion.

dk-teknologisk-mlnn commented 2 years ago

diverged metrics: dropping from 0.99 to 0.44 and 0.03 is rather critical? log images: nice :) Also padim dropped in performance, but not as crazy. Here's a patchcore result of "good" parts: 008

this is padim: DATALOADER:0 TEST RESULTS {'image_AUROC': 0.7589669823646545, 'image_F1': 0.8787878751754761, 'pixel_AUROC': 0.9781586527824402, 'pixel_F1': 0.22379672527313232} 008padim

dk-teknologisk-mlnn commented 2 years ago

I tried this other patchcore repo [ https://github.com/hcw-00/PatchCore_anomaly_detection ]on mvtec/screws and it gives me:

{'img_auc': 0.5911047345767575, 'pixel_auc': 0.9048583939897462}

and rather random anomaly maps as well..

samet-akcay commented 2 years ago

Thanks for reporting this discrepancy @sequoiagrove. We'll investigate the benchmarks